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Preface 



The Second International Working Conference on Active 
Networks — IWAN2000 

We are very happy to present the proceedings of “The Second International 
Working Conference on Active Networks - IWAN2000” . The proceedings contain 
the technical program for IWAN2000, which will be held on 16-18 October 2000 
in Tokyo, Japan. 

The increasing demand for highly sophisticated functionalities of networks 
makes the research on active networks, which aims to make network nodes pro- 
grammable and intelligent, more and more important. 1WAN2000 is the second 
workshop in this new area and is the best opportunity for experts in this new 
area to communicate. 

The first conference on active networks, IWAN’99, was held in Berlin in July 
1999 with 30 highly informative papers. We also selected 30 papers this year. The 
topics discussed in IWAN’99 covered a wide research area in active networks from 
architecture to applications. Though 1WAN2000 covers an essentially similar 
area, a wider range of topics on applications, such as multicast control, QoS 
management, and Mobile IP, increases the productivity of the conference greatly. 
Still, we received so many submissions on architecture and basic issues, the 
increase of submissions on new applications represents the fact that the research 
on active networks has begun to focus on practical issues. 

We would like to thank all the authors who submitted their work and deeply 
regret that we could not select many more papers for this conference. Presen- 
tation time is limited in a 3-day conference. As we had so many submissions, I, 
as the program committee chair of IWAN2000, had colleagues to serve as part 
of the review to encourage process. I was able to get an outstanding group, and 
each of them read and reviewed an incredible number of papers, which resulted 
in a successful conference program. I deeply appreciate all their efforts. 

We hope the readers will find these proceedings helpful in their future re- 
search. Also, we hope that IWAN will continue to be the center of the interna- 
tional community on active networks, and contribute to the research progress of 
this new area in realizing future generation network architecture. 
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IWAN2000 

Message from the Conference General Chairs 



On behalf of the IWAN international committee, welcome to the second Inter- 
national Working Conference on Active Networks, IWAN2000. 

Internet technologies continue to enhance the infrastructure of the emerging 
information society. The new developments increasingly affect both our personal 
lives and our work. In order to support this rapidly expanding infrastructure, 
various technologies are evolving. High speed data transfer systems such as wave- 
length division multiplexing (WDM) equipment and faster IP routers are the 
key solutions for the increasing quantitative demands. Active networks are the 
key solution for providing the required quality. By making network nodes pro- 
grammable and intelligent, one can realize networks with highly sophisticated 
functionality. 

There is an increasing demand to find common denominators for what this 
functionality should be in the medium- and long-term perspectives and to find 
the best practice for its realization. 

It is our great pleasure to organize IWAN2000 in Tokyo. We hope to provide 
a forum for the international exchange of ideas and results for this important 
domain, continuing the discussions we had at IWAN99 in Berlin and to offer the 
opportunity for evaluating this emerging technology. 

The positive comments we received during the organizing phase confirm our 
belief in the importance of active networks and the success of this IWAN confer- 
ence. We would like to thank the many volunteers, committee members, authors, 
reviewers, invited speakers, and sponsors for their enthusiastic contributions. 
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Abstract. This paper presents the main concepts of the 1ST Project 
FAIN “Future Active IP Networks” [10], a three-year collaborative re- 
search project, whose main task is to develop and validate an open, 
flexible, programmable and dependable network architecture based on a 
novel active node approach. This generic architecture for active net- 
works is an innovative integration of active networking, distributed ob- 
ject and mobile agent technology. Starting from the definition of a busi- 
ness model that underlines the FAIN architecture, we identify three key 
working areas for contribution: the active node platform layer, the 
service programming environment and a built-in management system. 

The active node platform layer of the FAIN Active Node is comprised 
of the kernel Operating System, a node resource access control frame- 
work, and active components for management, security and service pro- 
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vision. These elements provide the foimdations for Execution Environ- 
ments so that they can operate in an independent manner. A novel serv- 
ice programming environment is envisaged to enable the dynamic crea- 
tion or update and to secure deployment and operation of protocols. 
Such an environment supports various role-specific ways of deploy- 
ment, e.g. application-specific signalling or operator-governed network 
control signalling. 



1 Introduction 

The wide acceptance of IP has enabled the provision of new application services. The 
popularity of IP originates from its unparalleled ability to provide ubiquitous access 
and low prices regardless of underlying networking technology. These services can be 
offered on a global scale by almost everyone, simply by connecting a new web server 
to the Internet. Today IP [20] is considered a unique bridge for diverse applica- 
tion/user requirements with broadband transfer capability. 

However the development and deployment of new network services, i.e. services that 
operate on the IP layer, is too slow through best practice and standardisation. It cannot 
match the rapid growth of requirements in various applications. Examples of such 
services include signalling for quality of service (QoS), reliable multicast or Web 
Proxies/Caches/Switches/Filters. As with the intelligent network (IN) architecture in 
the PSTN world, the current Internet architecture needs to be enhanced in order to 
allow for a more rapid introduction and programmability of such services. 

The Internet community has realised the need for network-embedded functionality, 
and has been addressing these needs on a problem-centric basis rather than on an 
architectural basis. For example, various approaches to differentiated -service archi- 
tectures have applied the idea of active queue management in routers, resulting in 
algorithms such as RED and FRED, etc. which provide a reasonable class of service 
performance. As a second example, the MBONE architecture identifies MBONE 
flows, which are segregated from "regular" traffic by participating routers, among 
which, traffic is "tunnelled". 

Active Networks (AN) have been originally proposed [25] as an architectural solution 
for the fast and flexible deployment of new network services. The basic idea of active 
networks is to enable third parties (end users, operators, and service providers) to 
inject application-specific services (in the form of code) into the networks. Applica- 
tions are thus able to utilise these services to obtain required support in terms of net- 
work and network management resources, thus becoming network-aware. As such, 
active networks allow dynamic injection o/code for realisation of-application-specific 
service logic, or perform dynamic service provision on demand. But the dynamic 
injection of code can only be acceptable to network providers if it does not compro- 
mise the integrity, the performance and /or the security of networks. Therefore viable 
architectures for active networks have to be carefully engineered to achieve suitable 
trade-offs among flexibility, performance, security and manageability. 
Programmability, a fundamental requirement of active networks, means that a user is 
able to control dynamically the network to process packets in a required way, rather 
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than in a fixed fashion, e.g., by selecting/implementing preferred routing algorithms. 
Active networks normally implement code distribution and code execution mecha- 
nisms to enable such ability, so that injecting code programmed by users can enforce 
control of the networks. Another facet of programmability is that network services 
(developed as active network applications) use the open router interface (implemented 
as an API) to dynamically manage router resources. 

The IEEE PI 520 [4], [9] standardisation effort addresses the need for a set of standard 
software interfaces for programming of networks in terms of rapid service creation 
and open signalling. The technology under consideration spans from ATM switches 
and IP routers, to circuit or hybrid switches. Well-defined open interfaces represent 
abstractions of resources of the underlying physical network devices and are imple- 
mented as distributed computing objects. These open interfaces allow service provid- 
ers and network operators to manipulate the states of the network through the use of 
middleware toolkits in order to construct and manage new network services. 

The active networks research has made it clear that software -implemented active net- 
work elements and simple applications such as active pings and TCP through active 
bridges can perform in the 10-100 Mbps range. ANTS [26] and PLAN [21] have 
shown that capsules are in fact a viable concept, and in the case of PLAN, that mini- 
mal-function capsules can perform reasonably. The SwitchWare [16], [23] has dem- 
onstrated that mutually distrustful nodes can support remote-module loading using 
cryptographic hashes. ALIEN [2] has demonstrated that the distinction between the 
“programmable switch” (active extension) and “capsule” (active packet) models is a 
distraction rather than a central issue in active networks. An approach for designing 
high performance active network was demonstrated in [8]. 

What has yet to be designed is an integrated hardware/software system, which per- 
forms to a high level, is secure, flexible, programmable, manageable and usable by a 
variety of applications. Moreover, it provides the basis on which a number of Execu- 
tion Environments of different technology and goals will be deployed and tested 
against each other. These EEs may eventually take the form of different network ar- 
chitectures resulting in proper VPNs. 

The FAIN project [10] has been set-up to demonstrate such a system, and address key 
problems of resource partition and control, security, dynamic loading of protocols, 
and dynamic management in a new active network node environment. FAIN is a 3 
year 1ST collaborative research project among universities, research establishments, 
manufacturers and operators starting in May 2000. The project aims to develop an 
open, flexible, programmable and dependable network architecture based on novel 
active node concepts. The generic architecture for active networks is an innovative 
integration of active networking, distributed object and mobile agent technology. The 
project will contribute to the establishment and operation of a worldwide active net- 
works and services test bed. 

The remainder of the paper is organised as follows: Section 2 discusses the design of 
the FAIN Active Networks Architecture. Section 3 suggests the enterprise model on 
which FAIN is based. Section 4 addresses the node architecture. Section 5 addresses 
the management of the active networks. Section 6 identifies the FAIN testbed. Sec- 
tion 7 concludes and outlines challenges for the future. 




4 Alex Galls et al. 



2 The FAIN Design Approach 

In defining FAIN architecture, our goal was a system, which would allow experimen- 
tation and prototyping to test Active Networks ideas. From the architectural view- 
point, FAIN defines a network architecture, which is based on the eombination of 
traditional layer-based networking and distributed component-based networking. Such 
a combination of architecture design brings many advantages, for example, modularity 
and location transparency for service provisioning, which facilitates fast changes and 
smooth integration of new services and network technologies. 

Our design follows a bottom up approach, originating from the design of the AN plat- 
form layer and its components and moving towards the service programming envi- 
ronments held together by the built-in management system with the following charac- 
teristics. 

The active node platform layer of the FAIN Active Node is comprised of the kernel 
OS, a node resource access control framework, and active components for manage- 
ment, security and service provision. These elements provide the foundations on 
which Execution Environments are deployed and operate in an independent manner. 

A novel service programming environment enables the dynamic creation or update of 
protocols, and supports various role-specific ways of deployment, e.g. application- 
specific signalling or PNO-govemed network eontrol signalling. This environment is 
secure and maintains the interference-free execution semantics of different protocols 
or components, so that safe execution of protocols can be guaranteed and the network 
behaviour is predictable. 

In addition, FAIN proposes a management paradigm based on standardised API and 
autonomy of nodes. It is based on the approach identified in [11] and [12]. The para- 
digm enables the development of a fine-grained and more efficient management 
framework, which reduces needless traffic or information processing. Examples in- 
clude filtering and self-management of nodes, which take care of the management of 
their own resources and states. Autonomous management of nodes enables the distri- 
bution of management intelligence. Loosely coupled management functions facilitate 
the traditionally difficult tasks such as policy enforcement and integration of new 
managing functions. Re-usable components and interoperable operation can be 
achieved using the standard interface and an implementation using distributed objects 
and platforms. 

The FAIN architecture is based on a new enterprise model. In this way the FAIN ar- 
chitecture supports new business opportunities: 

• The project enables services to be delivered to Active Network Operators as prod- 
ucts that can be installed and executed according to service level agreements 
(SLA). 

• The project also identifies two new business players: active middleware providers, 
and active network solution providers. 
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3 Enterprise and Network Models 



Open telecommunications solutions will involve a variety of roles and players on the 
value chains with heterogeneous requirements. The FAIN project is based on an en- 
terprise model of the European Information Infrastructure [22] with some enhance- 
ments, as dictated by an initial requirement analysis conducted by related active net- 
works projects [16]. The enterprise model is depicted in the Figure 1. The definition 
of FAIN enterprise model specifies the business relations among various players in 
future active networks. 
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Figure 1. Initial FAIN Enterprise Model 



FAIN active network architecture 

Traditional network architectures are usually tied to the interface presented by the 
network to the user (e.g. in the form of UNI, NNI). An architecture for active net- 
works typically addresses the interface between AN providers and users, i.e. consum- 
ers and solution providers, and should be explicitly designed to allow for more than 
one such “network API". The following figure depicts the initial network architecture 
envisaged in the FAIN project. In addition the project will address the requirements of 
key functions in the provision of an active network: 
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• Service Provision: The objective of this activity is to identify the architectural 
requirements for flexible support of a range of services, including application 
services and network services. Focus will be put on service execution and man- 
agement environment. Mechanisms for code distribution and code execution will 
be a major requirement. 

• Security Provision: The objective here is to define the architectural requirements 
for security provision in active networks. It focuses on three principal aspects: 

(1) Trust-based software distribution: only trusted parties are allowed to download 
their components into networks for special provisioning of services. This is realised by 
the operations of authorisation and authentication of user/code/node in active net- 
works. The authorisation could be based on a predefined service level agreement 
(SLA). 

(2) Trust-based software execution: only trusted components (based on a component 
level agreement) are allowed to access and manipulate a particular set of network 
resources (e.g. in the form of virtual networks) and sensitive information resources. 

(3) Policy-controlled resource access, and runtime resource management should sup- 
port this interference-free software execution: software components run independently 
for provision of services, and their interaction will be policed and controlled to ensure 
that an abnormal execution of one component will not negatively impact on other 
components’ execution. This should be supported by independent execution environ- 
ments (EE), a guarantee of integrity of the shared data, and transaction-based interac- 
tion among components. 




Figure 2. Initial FAIN Active Network Architecture 
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Management Service: The objective here is to specify the requirements for managing 
the active nodes, node resources and services. The focus will be on supporting service 
provisioning, including resource management functions (partitioning and policing); 
service management, including service installation, upgrade and supervision; network 
management functions as configuration and monitoring of nodes and links. 

Network Integration: The objective of this part is to specify the system requirements 
for integrating the functions for service provision, security, and management within an 
active node and over a unified active network. Of typical focus are interoperability 
and dependability (mainly reusability and reliability) of active nodes. 



4 FAIN Nodes and Networks 

Central to the active network technology is the active network node. In FAIN, the 
design of the AN Nodes will be based on the specification of FAIN Enterprise Model. 
We envisage a three-tier node architecture as depicted in Figure 3. It represents a 
generic framework for developing the elements in an active network. An active net- 
work thus consists of a set of active nodes plus -the possible addition of traditional 
(“passive”) nodes, connected by a variety of network technologies, e.g. IP, IP/ATM, 
Ethernet, UMTS, etc. 

An active node is an essential network element in the support development of value- 
added solutions by third parties, or direct use by consumers (users or application de- 
velopers) to inject customised services. It implements APIs according to the require- 
ments of reference points (R3, R4, R6 and R7 in Figure 3) defined in the business 
model. To have an open and secure control of network resources, an active node is 
built upon one programmable network element, e.g. an IP router with an open inter- 
face. The computing platform in the node provides a layer through which down- 
loaded/injected components interact with networks and with each other. In general, it 
consists of a local operating system (e.g. Linux, free BSD or an embedded OS), one or 
more distributed processing environment (e.g. TINA-DPE, or mobile agents platform) 
and system facilities. 

Upon the platform, a set of execution environments (EE) will be created to host serv- 
ice components. Solution providers according to security agreements can download 
such components. They can also be dynamically injected by consumers to implement a 
service for particular applications. Their execution is controlled so that services they 
support run safely in parallel, and the behaviour of the node and the network is pre- 
dictable. 

Programmable network element 

Active networks are built upon a programmable network infrastructure that mainly 
comprises of programmable routers and the open interface for controlling the routers. 
Router hardware will be contributed by router vendors while their OS will be en- 
hanced to account for the identified AN requirements. Among potential enhancements 
are advanced IP services, e.g. Diffserv, for selective package processing, partitioning 
resources such as virtual network to support VPN, and native mechanisms for identi- 
fying active packets from incoming data flows, and distributing them to appropriate 
execution environments. 
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Figure 3. Initial FAIN Active Node Architecture 



In addition, an open node interface that actually represents an abstraction of router 
resources ranging from computational resources (CPU, memory, etc.) to packet for- 
warding resources (bandwidth, buffer, etc.) will be defined. The specification of the 
open AN node interface may also serve as an input to relevant standardisation activi- 
ties, e.g. IEEEP1520 [3] or IETF [13] and other consortia [18]. 

Active Node Platform 

The node platform can be viewed as a container of AN node services, called hereafter 
facilities. These facilities represent the local node services and provide the foundation 
for the execution of service components, which are usually in a network-wide scope. 
Example facilities are: 
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• Binding facilities that support flexible and dynamic QoS binding of distributed 
components; 

• Management facilities that provide transparent access of resources (e.g. a MIB), 
and enforce policies when necessary; 

• Service facilities that support distribution (downloading or injection) of compo- 
nents (as CORE A [6] objects, mobile agents [11] or program segments carried by 
packets); 

• Security facilities that perform trust-based authentication of software components 
when distributed, and runtime state checking to ensure safe execution. 

• Communication facilities that provide the service components with connectivity 
using flexible protocol bindings. 

To guarantee a secure and fair use of resources, the platform defines a resource con- 
trol framework that partitions and allocates resources (including computing resources 
such as CPU time and memory, and network resources such as bandwidth and routing 
table). The framework implements the API as an abstraction of the partitioned re- 
sources, which will be used by an execution environment. It also implements a polic- 
ing entity that enables policy-based management, i.e. enforcing and checking the ac- 
cess to node resources. 

The resource framework and the active network facilities will be designed as the 
services of a distributed processing environment (DPE). FAIN proposes to allow im- 
plementations of these services in different DPEs, depending on specific requirements 
in terms of performance, or functionality. DPE could be based on TINA-DPE [24], 
real-time ORB, JAVA virtual machine [27], mobile agent platform [14], or other dis- 
tributed platforms. As a major contribution, an AN DPE will integrate a dedicated 
ORB and a mobile agent platform to provide a full range of signalling services and 
real-time QoS control for distributed applications. In order to support the needs of 
distributed multi-media and real-time bound applications, a specific execution envi- 
ronment will be provided which is optimised for high performance, i.e. high packet 
processing rates and low latency. At last, active network services can be provided to 
satisfy the very diverse requirements of applications in the future information society. 
Node platform provides the basic functions on which execution environments rely. As 
such, it takes the form of an OS, manages the resources of the active node and medi- 
ates the demand for resources, including transmission, computing, and storage. It thus 
isolates EEs from the details of resource management and from the effects of the be- 
haviour of other EEs. The EEs in turn hide most of the details of the platform from the 
users and implement the Network API. 

AN technology have advocated the co-existence of few Execution Environments (EE) 
at each node where each one of them implements a different virtual machine [5] on 
top of the NodeOS [19]. To the best of our knowledge, no project has implemented 
and integrated such architecture in a large-scale deployment. The only similar effort 
we are aware of is the Tempest Framework [17] for ATM networks. 

Service Execution Environment 

Supported by the node platform, an active node allows existence of a variant number 
of environments for execution of software components. These environments can be 
built around different DPEs, and so, it is very likely that heterogeneous components 
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co-exist in a single active node. The interoperation of components running in the 
same -type environment is guaranteed by the underlying DPE (e.g., by HOP). The 
interaction of components in different environments will be an open issue to investi- 
gate in the project, towards a full solution for interoperability of active networks. 
Distribution of the components into environments relies on the underlying active net- 
work facilities. 

An EE can be created by or on behalf of the consumer or AN solution providers, to 
meet application-specific requirements. Such an EE supplies a restricted form of 
service programming, in which the user supplies a set of parameters and/or simple 
policy specifications. In any case, the program may be carried in-band with the packet 
itself, or installed out-of-band. Out-of-band programming may occur in advance of 
packet transmission or on-demand, upon packet arrival automatically (e.g. when in- 
structions carried in a packet invoke a method not present at the node, but known to 
exist elsewhere), or under explicit user control. These EEs are customisable via inter- 
faces visible to the trusted users of the EE. Dynamic customisation of these EEs is an 
important issue to address. Such a capability on one hand enables maximal customis- 
ability by users, but on the other, raises serious concerns for security. Spawning these 
application-specific EEs could be done by a bootstrap EE owned by trusted authority, 
i.e. active network operators. One key novel capability envisaged in the FAIN project 
is the existence of EEs that run the middleware components of dedicated TINA-DPE 
and Mobile Agents. These middleware components would facilitate further expansion 
of the scope and flexibility for the mobility and programmability in the Active Net- 
works. 

Generic EEs will be developed to support particular types of services. An EE special- 
ised for management services will host the components conveying management func- 
tions, e.g. node configuration. It will define an interface accessible by node managers 
to customise the environment parameters so that a policy can be enforced dynamically. 
Generic management functions will be developed in the management EE as EE- 
inherent components to support application management requirements. 

Another generic EE foreseen as critical for provisioning active networks is one for 
protocols/signalling that perform network control functions, e.g. packet scheduling. 
Such an environment should smoothly support dynamic provision of protocol compo- 
nents, and their update. The result is that deployment and new signalling can be safely 
executed. Multi-media applications and real-time bound applications need high- 
performance active network services, which currently cannot be deployed in a full- 
featured ORB-based execution environment. For such applications, a high-speed exe- 
cution environment is provided. It will be based on the highly efficient Node OS, 
augmented with a thin layer of resource access functions as a minimum. 

Service programming 

In an active node, new services can be introduced as distributed components. These 
components will be downloaded or injected using distribution mechanisms in the node 
platform and executed in execution environments. One such component is allowed to 
interact with others and the services provided by the platform, e.g. resource manager 
and facilities through Node API. The results of the interaction lead to the execution of 
the services and changes of the node state. These components can be remotely con- 
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trolled or dynamically programmed, and thus the services they implement can be 
flexibly customised. 

As envisioned, building applications either as value-added solutions or consumer- 
specific services are both based on this service programming paradigm, and aligned 
with the interface specification identified in the FAIN Node architecture. 

The architecture proposed in the project is generic and supports specialisation ac- 
cording to different needs of service provision, security and management. For exam- 
ple, one Active Network Operator (ANO) can only implement the service environ- 
ments into an embedded OS, which controls the data path and provides expected per- 
formance when re-routing the packets. The safe execution is guaranteed by trust-based 
download, i.e. only it is allowed to download the service components. 



5 FAIN Network Management and Applications 

Based on the network API, two different end-to-end Aetive Network solutions will be 
designed and developed, demonstrating the potential capabilities of FAIN system. The 
two Case Studies will be: 

• Policy-based Active Network Management, which will provide flexible network 
configuration, efficient network error detection and dynamic resource management 
through continuous interworking among active nodes. 

• Dynamic Creation of Protocols, which will enable the rapid provision and update 
of protocol stacks. 

The objective of developing these two services on top of the FAIN architecture is to 
demonstrate the benefits of active networks in general and to evaluate the applicability 
of the FAIN architecture in particular. To demonstrate the applicability of the FAIN 
architecture with respect to the provisioning of end-to-end network services, two case 
studies will be designed and implemented: Policy-Based Network Management and a 
demonstration of the dynamic creation of protocols. 

Policy-based Active Network Management 

Policy-Based Networking (PBN) is currently promoted by several network equipment 
vendors (e.g. DTMF) and is standardised within the IETF Policy working group. The 
current goal of PBN is to provide facilities, which allow control the multiple types of 
devices that must work in concert across even a single domain. Examples of services, 
which can be controlled by PBN, are currently mainly Quality of Services (QoS) res- 
ervations. Examples of devices, which can be “PBN-enabled”, are hosts (clients and 
servers), routers, switches, firewalls, bandwidth brokers, sublet bandwidth managers 
and network access servers. Policies are defined as rules governing the allocation of 
network resources to certain applications/users. The IETF Policy WG defines a scal- 
able framework for policy administration and distribution that will allow 
interoperability among the multiple devices that must work together to achieve a con- 
sistent implementation of a network administrators policy. For this reason, directory 
schemas are standardised in order to enable the various network devices to interpret 
the configuration information consistently. For the distribution of policy information, 
IETF protocols such as LDAP, DIAMETER, COPS are used. 
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In FAIN, service logic for the PBN-based provisioning of a sample service (e.g. vir- 
tual private networks or QoS policies, depending on progress in the IETF) will be 
implemented on top of the Node API. With this case study, we expect to be able to 
demonstrate the following benefits of active networks: 

• Flexibility: Using active networks, policies can not only be expressed as data 
structures - which are limited by the constraints imposed by standardised database 
schemas, but they can be expressed quite flexibly as active code. 

• Simplified service deployment: Since all the FAIN nodes support the same Node 
API, we expect that the same PBN service logic may run on the different nodes. In 
contrast to current approaches, where the service logic has to be developed indi- 
vidually by each node vendor, this results in an improvement of the service devel- 
opment and deployment process with respect to costs and time to market. 

• Benchmarking: Since PBN-enabled network devices are expected to be available 
on the market soon, the PBN case study allows benchmarking with existing PBN 
implementations, in particular with respect to performance, interoperability and 
reliability. 

Dynamic Creation of Protocols 

In a second case study we will demonstrate how application-specific code can be 
injected into active networks in order to provide application-specific network services. 
A specific application/service has still to be chosen for that purpose. Several applica- 
tions which come into consideration for that purpose have already been mentioned 
(e.g. QoS signalling, Web Caching/Web Content Adaptation/Web Switching, reliable 
multicast...). 

While in the PBN case study, an existing service will have been prototyped with 
FAIN, this second case study will demonstrate a distinct feature of Active Networks, 
namely that they allow the rapid introduction of new network protocols by third par- 
ties or even by skilled end-users. Properties of the architecture which are important for 
that purpose and which will be demonstrated include: 

• interference-free execution of the different protocol stacks (i.e. a malicious proto- 
col stack does not affect the correct operation of other protocol stacks); 

• security issues; 

• monitoring and logging of service execution, in particular with respect to resource 
consumption. 



6 FAIN Test Bed 

In the FAIN project a networking test bed will be built. It has at its core a number of 
sites running an Active Network Node, which is designed and implemented within the 
project (Figure 4). The individual sites are connected through IP tunnels over the 
Internet (possibly with guaranteed quality of service), leased lines or native ATM links. 
The main purpose of this test bed is to assess the results of the project using experi- 
mental scenarios and quantitative measurements. 
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Figure 4. Active Network Test Bed 

By running different active network services on the test bed, it will be shown that the 
interworking of different active network implementations from various partners is 
possible, thereby providing a worldwide test bed. The different implementations will 
be built according to the specifications developed on the FAIN Node Architecture and 
will support multiple execution environments with different levels of performance. 
The testbed will serve to test the degree of programmability and interworking issues 
among ANN with different execution environments, while interoperability tests will be 
carried out among ANNs that have common execution environments. 

The envisaged test bed is depicted in Figure 4. The active network consists of 4 - 8 
sites at partner locations, with one or more ANNs and several terminal devices. ATM 
links or IP-tunnels interconnect the sites. The terminal devices access ANNs through 
network technology available, including Ethernet, Fast-Ethemet, and ATM. Therefore 
FAIN will form the one of the first ever worldwide Active Network test beds inter- 
working with other testbeds. Management service provision and a d5mamic protocol 
provision case studies are envisaged for the demonstration of the FAIN test bed. 



7 Conclusion 

This paper gives an overview of the 1ST Project FAIN, a three-year project, whose 
main task is to develop and validate an open, flexible, programmable and dependable 
network architecture based on novel active node concepts. The generic architecture 
for active networks is an innovative integration of active networking, distributed ob- 
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ject and mobile agent teehnology. The project will contribute to the establishment and 
operation of a worldwide active networks and services test bed. The proposed archi- 
tecture and enterprise model of the initial FAIN specifications makes possible the 
development, provision and validation of a novel Active Networks architecture for 
future IP networks. The trials envisaged in the FAIN project will demonstrate inter- 
connectivity across a worldwide active networks infrastructure in a multi-provider 
multi-domain environment. 
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Abstract. A primary goal of active networking is to increase the pace of 
network evolution. Evolution is typically achieved via extensibility; that 
is, typical active network implementations provide an interface to extend 
network nodes with dynamically loaded code. Most implementations em- 
ploy plug-in extensibility, a technique for loading code characterized by 
a concrete, pre-defined abstraction of future change. While flexible and 
convenient, we argue that plug-in extensibility alone is not sufficient for 
true network evolution. Instead, we propose dynamic software updating, 
a technique that reduces the a priori assumptions of plug-in extensibil- 
ity, improving flexibility and eliminating the need to pre-plan extensions. 
However, this additional flexibility creates issues involving validity and 
security. We discuss these issues, and describe the state-of-the-art in sys- 
tems that support dynamic software updating, thus framing the problem 
for researchers developing next-generation active networks. 



1 Introduction 

Active networks (AN) are networks whose elements are, in some way, programm- 
able. The idea of AN was developed in 1994 and 1995 during discussions in the 
broad DARPA research community, and since then a significant number of proto- 
types {e.g. [9,23,24,3,18,15]) have emerged. Reviewing the early AN discussions, 
we find one chief motivation driving the initiation of research into AN: faster 
network evolution. For example, the Switch Ware project proposal, from the Uni- 
versity of Pennsylvania, states ([21], p. 1): 

The pace of network evolution (not switch evolution, network evolution) 
proceeds far too slowly. To a large degree this is a function of standard- 
ization. 

Ho does active networking address this problem? Early work by Tennenhouse and 
Wetherall motivates that active networking facilitate evolution by customization 

([22], p. 2): 
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The [active network] programming abstraction provides a powerful plat- 
form for user-driven customization of the infrastructure, allowing new 
services to be deployed at a faster pace than can be sustained by vendor- 
driven standardization processes. 

For the most part, existing AN implementations embrace this philosophy and em- 
ploy customization as the means to evolution. Usually customizability is achieved 
through extensibility, the network elements may be extended with user (or ad- 
ministrative) code to add or enhance functionality. For example, many systems 
allow new packet processing code to be dynamically loaded, as in ALIEN [4] and 
Netscript [24]. In some systems the extensions reside in the packets themselves, 
as in PLAN [9], and ANTS [23]. 

While it is clear that all of these implementations add flexibility to the net- 
work by way of extensibility, we believe that no existing system truly solves 
the problem of slow network evolution. Other authors have cited inadequate re- 
source management and security services as the main inhibitor of active network 
deployment, but we believe the problem is even more fundamental: no existing 
system is flexible enough to anticipate and accommodate the future needs of the 
network. 

In this paper, we look closely at the extensibility strategy employed by many 
AN systems with an eye towards network evolution. In particular, we find that 
implementations at their core rely on plug-in extensibility, a strategy for loading 
code that abstracts the shape of future changes with a pre-defined interface. 
Plug-ins simply and efficiently support a customizable network service, but they 
are not flexible enough to support true evolution. Drawing from our own ex- 
perience and related past work, we propose a more flexible alternative, termed 
dynamic software updating, that we believe can much more effectively address 
the evolution problem. 

Our presentation is divided into three parts. In Section 2, we define plug- 
in extensibility and provide two concrete examples of its use, the Linux kernel 
and the PLANet [11] active internetwork. We then show how many mature AN 
implementations, including ANTS [23], and Netscript [24], among others, also 
employ plug-in extensibility. In Section 3, we explain how plug-in extensibility 
is inherently limited with respect to how a system may change; here we present 
some concrete experience with PLANet. Finally, in Section 4, we propose an al- 
ternative to plug-in extensibility, termed dynamic software updating. We explain 
the benefits and potential problems with this approach, and then point to past 
and future work that will make it a reality. We conclude in Section 5. 

2 Plug-In Extensibility 

Most AN systems provide network-level extensibility by allowing network nodes 
to dynamically load new code. This is a popular technique in areas outside 
of AN as well, including operating systems (OS’s), like Linux and Windows, 
and web browsers. When a new service is needed, some code is loaded that 
implements the service. Typically, the code is constrained to match a pre-defined 
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Fig. 1. Plug-in extensibility: extensions are “plugged-in” to an extension inter- 
face in the running program 

signature expected by the service’s clients. We refer to this approach as plug-in 
extensibility. 

Essentially, plug-in extensibility is a technique that abstracts the shape of 
loadable code. Loaded code is accessed by the running program, the client, 
through an extension interface. Extensions, while internally consisting of ar- 
bitrary functionality, may only be accessed by the client through the extension 
interface, which does not change with time. This idea is illustrated abstractly in 
Figure 1. In this section we present some examples of how this works, both to 
make the ideas concrete, and to demonstrate some problems and limitations of 
this approach. The impatient reader may skip to Section 3 where we argue that 
plug-in extensibility is insufficient for evolution. 

2.1 Linux 

The Linux kernel code uses plug-ins extensively; plug-ins are called modules in 
Linux terminology. Once a module has been dynamically linked, an initializa- 
tion routine is called, which alters some of the kernel’s currently visible data- 
structures to point to the new code. Similarly, when a module is unloaded, a 
cleanup function is called to remove any vestiges of the module from these data- 
structures. Web browser plug-ins are implemented using a similar technique. 

A specific example for the Linux network stack is shown pictorially in Fig- 
ure 2, where the code is divided into user code, the kernel socket code (which 
is the client in this case), and the plug-in code; time proceeds to the right. The 
letters label important points in the execution and the text that follows is keyed 
to these labels. 

Suppose a user attempts to open a socket that will use the IPX protocol 
family (A). The kernel first checks a list of structures, indexed by the protocol 
family, to see if the IPX handler is present (B). Each handler structure consists 
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Fig. 2. Linux protocol code 



of a number of function pointers that implement the functions expected of a 
network protocol, effectively defining the extension interface, or signature, of an 
abstract protocol handler. If the IPX handler is not present in this list, then the 
socket code attempts to load a module that implements it (C). During loading, 
the IPX module’s initialization function will be invoked (D). This function al- 
locates and initializes a new handler structure for its functions, and stores the 
handler in the kernel’s handler list. Afterwards, the socket code checks the list 
again (B’); when found, the socket code will invoke the handler’s socket creation 
function (F), which will return a new socket structure. The socket code then 
keeps track of the structure in its file descriptor table, and returns the hie de- 
scriptor to the user (G). After some period of quiescence, or by user-directive 
when the handler is not in use, the handler may be unloaded, which will cause 
the removal of the handler structure from the kernel list. 

This technique allows the protocol handling code in the kernel to be ex- 
tensible. Protocol handlers are treated abstractly by the socket code via the 
extension interface. In this way, changes may be made on the granularity of pro- 
tocol handlers — the user and socket portions of the figure will always be fixed, 
but the plug-in code can change. New protocol handlers for different protocol 
families may be added, and existing handlers may be removed and replaced with 
new versions. All loaded handlers must match the the extension interface of the 
socket code if they are to be accessed correctly. 

There are, however, some important limitations. First, we cannot change 
the standard procedure for dealing with plug-ins. For example, while we could 
dynamically change the handler signature with some new function types, the old 
client code will never use them (because it is compiled to use the old signature) . 
Similarly, we cannot usefully change the types of current functions, because they 
will be interpreted using the old types. We could solve this problem if we could 
alter the client code to take advantage of the new or changed features. But 
this is not possible because the system has not been programmed to allow the 
client code to change. Thus, to make these kinds of changes, we would have to 
recompile and redeploy the system. 

Another limitation is that plug-ins may may not be updated (that is, replaced 
with an “improved” implementation) while in use. The Linux socket code forbids 
this in particular because there is, in general, no way to transfer the state of the 
current version of the module to the new version, which would be needed to 
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allow open connections to operate seamlessly. Such a state transfer would have 
to be anticipated and facilitated by the client code (we demonstrate an example 
of this idea in the next section). Disallowing updates to actively running code 
is probably reasonable for this application, but we will argue that it is not an 
ideal choice for active networks. 

2.2 PLANet 

Although the details differ between systems, AN implementations make use of 
plug-in extensibility in much the same way as Linux. As an example of this, we 
will first focus on our own system, the PLANet [11] active internetwork. Other 
AN systems will be shown to fall into the same mold in Subsection 2.3. 

PLANet is based on a two-level architecture that provides lightweight, but 
limited, programmability in the packets of the network, and more general- 
purpose extensibility in the routers. Packet headers are replaced by programs 
written in a special-purpose language PLAN [9], resulting in much greater flex- 
ibility than traditional headers. When packets arrive at a node to be evaluated, 
their PLAN programs may call node resident service routines, which form the 
second level of the architecture. The service routine space is extensible, allowing 
new service routines to be installed or removed without stopping the execution 
of the system. This is implemented by dynamically linking code that imple- 
ments the new service and registering it in a symbol table used by the PLAN 
interpreter. 

The PLANet service-level uses plug-in extensibility. Consider the following 
example. Suppose we want to add a new PLAN service getRand that returns a 
pseudo-randomly generated integer. We must load some new code to implement 
the service. We present this code piecemeal below, in C.^ 

At the core of the new functionality is the function rand, which returns a 
randomly generated integer (code not shown) . Additionally, we must include an 
interface function randlf c, which mediates access between the PLAN interpreter 
and the actual code. The arguments to all service interface functions include a 
structure active_packet_t, which describes the current PLAN packet, and a 
list of PLAN values (represented as a null-terminated array), which are the 
actual arguments to the service provided by the PLAN program. The value.t 
structure is a tagged union that describes all possible PLAN values. In this 
case, the arguments are ignored (the list of values should be empty), while the 
returned value_t is tagged with the INT tag: 

value_t *randlf c (active_packet_t *p, value_t *args[]) { 
value_t *v = malloc (sizeof (value_t) ) ; 
v->tag = INT ; 
v->val = randO ; 
return v; 

} 

^ The code examples shown here are in C, but in reality, PLANet uses the type-safe 
language Ocaml [14] for its implementation. 
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Finally, the interface function must be added to the PLAN symbol table, so that 
it is visible to PLAN programs. This is done via the register_svc function, 
which takes as arguments the name that the function will be referred to by 
PLAN programs, and a pointer to the interface function. When the new code is 
loaded, its init function will be executed (just as in the Linux protocol handler 
example), which calls register_svc with the name getRand and a pointer to 
the interface function; 

extern void register_svc (char ^planSvcName , 

value_t * (*if cFun) (active_packet_t *p, 
value_t *args[])); 

void init (void) { 

register_svc( "getRand" ,randlfc) ; 

} 

Why is this plug-in extensibility? The giveaway is the type of register_svc. All 
service routines that may be added and later accessed by PLAN functions must 
correspond to the type of if cFun, which is essentially the extensibility interface 
for the system. In the Linux protocol code, plug-ins are allowed to be new or 
improved protocol handlers; for PLANet, plug-ins are PLAN services. Note that 
services are not the only kind of plug-in in PLANet, as we shall see in the coming 
sections. 

2.3 Other Active Networks 

Just because PLANet uses plug-in extensibility does not imply that all active 
network approaches are so limited. For the remainder of this section we touch on 
some of the more mature AN implementations and show that while the makeup 
of plug-ins differ, all systems use plug-in extensibility. Typically, systems fall 
into two categories, those based on active packets (or capsules), and those based 
on active extensions. For the former, we concentrate on the Active Network 
Transport System [23] (ANTS) as the representative system, and for the latter 
we focus on Netscript [24]. 



ANTS and Active Packet Systems ANTS is similar to PLANet in that it 
makes use of packets that (logically) contain programs, termed capsules. These 
capsules are written in Java, and dynamically linked into the ANTS implemen- 
tation with access to a node programming interface including a subset of the 
JVM libraries and some additional utilities. Rather than carry the packet code 
in-band, the code occurs in the packet by reference. In the common case, the 
reference will be to code present in the node’s code cache; otherwise, a built-in 
distribution system will retrieve and dynamically link the code. 

In ANTS, the plug-ins are the capsule programs themselves. Each capsule 
plug-in essentially has the signature shown below: 
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typedef struct { 

void f(fields_t *pkt_fields, 
value_t *args [] , 
void *payload) ; 

} capsule_t; 

All packet programs are a single function whose arguments are the packet’s fixed 
fields (like source and destination address), some protocol-specific arguments 
(like a sequence number, or flow identifier), and finally a generic payload. 

Because capsules are the only plug-in, much of the ANTS system is not 
subject to change; this includes the code distribution service, the entire node 
programming interface, the packet marshalling and unmarshalling code (for the 
protocol-specific arguments), the code cache, the security enforcement mecha- 
nisms, etc. The only way that an ANTS node can change with time is by loading 
different capsule programs. If some aspect of the node programming interface, 
or the distribution protocol, needs to be changed, then the nodes would have to 
be changed, recompiled, and redeployed. 

In general, this reasoning applies to other active packet systems, like Smart- 
Packets [19], PAN [17], and the packet programs of PLANet. 



Netscript and Active Extension Systems Netscript is a system for writing 
composable protocol processors. The central abstraction in Netscript is called a 
box, which is conceptually some piece of code with a number of in-ports and out- 
ports; in-ports receive incoming packets, and out-ports transmit packets. Boxes 
are dynamically loaded and connected together by these ports to form modular 
protocol processing units. 

In Netscript, the form of plug-in is the box: all loadable objects must subclass 
the Box class (Netscript uses Java as its underlying implementation). Because 
the majority of a system built with Netscript is composed of boxes, much of 
the system is subject to change. Exceptions include the Java libraries (the box 
programming interface, in some sense), the box- loading system, and the top- 
level packet filter. However, some boxes are in essence unchangeable because 
they encapsulate state, and thus cannot be safely replaced, along the same lines 
as the Linux example. 

An interesting way to improve the evolutionary ability of a Netscript system 
would be to wrap the library routines in boxes. For example, we could create a 
HashBox box to represent the Hashtable class. To create a new “hashtable”, we 
would send a message to one of the box’s in-ports, and it would emit an object on 
one of its out-ports. By sending the object and some arguments through various 
Hashbox in-ports, and extracting results from selected out-ports, we simulate the 
normal function call semantics. The benefit would be that an improved hashtable 
implementation could replace the old one, if needed. However, this technique is 
clearly tedious and error-prone, and thus not really practical. 

A number of other systems are similar in spirit to Netscript. ALIEN [3] was 
primarily designed for building modular networking code. CANES [15] makes 
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use of program templates (called the underlying programs) parameterized by 
some slots that contain user-defined code. In all cases, the elements that may be 
loaded are limited in the same way as Netscript boxes. 



Other Systems Some AN systems do not rely on interpretation or dynamic 
linking as their underlying means of extensibility. Instead, they use a more tradi- 
tional hardware-based, process model for extensions. For example, ABLE [18] ’s 
extensions are processes spawned by the node’s session manager. 

In these systems, plug-ins are essentially whole programs whose extensibility 
interface consists of the allowed IPC mechanisms to the rest of the system. Just 
as plug-ins are limited by the programming interface with which they may be 
called, these programs are limited by their IPC interface. 

3 Why Plug-In’s Are Insufficient 

Plug-in extensibility is convenient, useful, and widespread. Despite this, we be- 
lieve that if AN is to facilitate substantial network evolution we must go beyond 
plug-ins. In this section we argue why this is so, and in the next section we 
propose how to do better. 

3.1 Limitations of Plug-Ins 

Plug-ins are convenient because they abstract the kinds of changes that may be 
made in the future, and thus give the current code an interface to deal with those 
changes. In the Linux case, the socket code does not care what code it is calling, 
only that it will perform the proper kind of function (like setting up a socket 
object and returning it). Similarly with PLAN services, the caller (the PLAN 
interpreter) only cares that the service function performs some action with the 
given arguments and returns a PLAN value. 

However, to create a system that is robust to long-term change, as is the goal 
in active networking, we need to minimize onr assumptions abont the system. 
Concretely, we want to minimize the size of the unchangeable program. This is the 
part of the program that is not made of plug-ins, and therefore is not amenable 
to change. The larger this part of the program, the more likely that some future 
demand will be impossible to accommodate. To make this point more concrete, 
we consider a practical example that we encountered with PLANet. 

3.2 Evolving PLANet’s Packet Queue 

During the development of PLANet, we decided that a useful AN application 
would be to allow administrators to change their queuing discipline on demand, 
to meet current network conditions.^ In particular, we wanted to be able to 
change from a single FIFO queue shared by all devices to a set of queues, one 

^ In fact, this application arose out of the need to demonstrate network evolution. 
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static pktQ_t packetQ = global packet queue 
void queue_packet (active_packet_t *p) { 
queue p onto packetQ; 

} 

active_packet_t *dequeue_packet () { 
if queue length is > 0 

dequeue top element from packetQ; 
else 

throw an exception 

} 



Fig. 3. Straightforward packet queue implementation 



per device, serviced round-robin to obtain fair-queuing. But we could not do this 
unless queues could be plugged-in; otherwise, we could not force the current code 
to use the new implementation. Therefore, at the time, we coded the queuing 
discipline to be a plug-in in PLANet. 

Our initial queuing implementation, part of the unchangeable program, 
is shown in Figure 3, defining queuing operations like queue.packet, 
dequeue.packet, etc., to operate on a globally defined queue. This is simple 
and easy to read. 

To make queues a plug-in, we had to add two things to the straightforward 
implementation. First, we defined the type of the plug-in as a series of function 
pointers to the queuing operations. We then created a default component using 
the straightforward implementation presented in Figure 3, and provided a means 
to access and change the implementation at runtime. All of this additional code 
is shown in Figure 4. Here, the default queue implementation is created with 
def aultQ. Users of the packet queue access its functionality through the interface 
functions queue_packet, dequeue_packet, etc. These functions call the plug- 
in’s functions and return their results. Future replacements are installed using 
install.qp. This setup is almost exactly the same form as the Linux protocol 
code, but with one difference: install.qp transfers the old state (the packets in 
the queue) to the new implementation, and thus the timing of the change is not 
limited to when the queue is inactive. All queue replacements are constrained to 
match the type of queue_plugin_t. 

While queues are now dynamically updateable, there are two basic prob- 
lems. First, we needed to anticipate not just the need for the implementation 
to evolve, but even the form the evolution should take (that is, the interface of 
the queue plug-in). Second, the code that is “plug-in enabled” is substantially 
more complicated and less clear than the code written in the natural way. We 
can easily imagine that constructing a large system in this way, with many kinds 
of plug-ins, will result in obfuscated, error-prone code. Or equally likely, we can 
imagine that programmers will decide to sacrifice the ability to one day extend 
the code, to make their immediate task easier. 
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typedef struct { /* Queue plug-in type */ 
void (*q) (active_packet_t *) ; 
active_packet_t * (*dq) (void) ; 

} queue_plugin_t ; 

static queue_plugin_t *q = NULL; /* Global queue */ 

void def aultQ (void) { /* Default implementation */ 
q = malloc (sizeof (queue_plugin_t) ) ; 
q->q = queue_packet_old; /* same as */ 
q->dq = dequeue.packet.old; /* straightforward impl */ 

} 

/* User interface */ 

void queue_packet (active_packet_t *p) { 

q->q(p) ; 

} 

active_packet_t *dequeue_packet () { 
return q->dq() ; 

} 

/* To load a new queue implementation */ 

void install_qp (queue_plugin_t *nq) { 

Move packets in old q to new one, then 
q = nq; 

} 



Fig. 4. Plug-in queue implementation 



There is a more important problem. In this case, we anticipated that queues 
should be plugged-in, and coded the system as such. However, evolution implies 
change in ways we cannot anticipate, and thus may not fit our pre-defined mold. 
For example, the designers of the Internet did not anticipate its current demand 
for Quality of Service — it was specifically excluded from the design of the best- 
effort network service (the telephone network already did QoS well). Yet, high 
demand for QoS is precipitating proposals to change the Internet, including 
diffserv [1], intserv [2], RSVP [6], etc. Therefore, we feel it is not really reasonable 
to think we can choose just the right abstractions now and have those choices 
hold up over many years. 

Ideally, we would make every program component a plug-in, but without the 
problems of code obfuscation and fixed interfaces that we saw above. What we 
really need is a solution that allows more general changes to be made without 
having to choose the form of those changes ahead of time; we shall explore this 
idea in the next section. 
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4 Dynamic Software Updating 

Ideally, we would like to code our system in the manner of the simple queue 
implementation, but still be able to support evolution by updating components 
dynamically, as with the plug-in version. Furthermore, evolution would be facili- 
tated if we had more flexibility in how we perform the updating than is typically 
afforded by plug-ins. For example, we would like to be able to change the type 
of components at runtime, rather than limiting the replacement type to match 
the original compile-time type. In this section we argue that rather than just 
plug-in extensibility, active networks require what we term as dynamic software 
updating to achieve true evolution. 

We begin by defining the requirements of dynamic software updating, and 
their ramifications in terms of the validity and security of the system. We hnish by 
pointing to some promising past and present efforts to realize dynamic software 
updating, with the hope that AN researchers will integrate these techniques into 
their next generation systems. 

4.1 Dynamic Software Updating 

In order to facilitate system evolution, we have the following requirements, which 
comprise dynamic software updating: 

— Any functional part of the system should be alterable at runtime, without 
requiring anticipation by the programmer. In the queue example, we would 
be able to code the queue in the simple, straightforward manner, but still 
change it at runtime. 

— Alterations should not be limited to a predefined structure, i.e. component 
signatures should be changeable, allowing the implementation to evolve as 
demands change. 

For example, suppose we want the queue to track how many packets have 
been processed. With queue plug-ins, while we could dynamically add a new 
queue implementation that counts packets, we could not make this informa- 
tion available; the type of queue_plugin_t (Figure 4) constrains all queue 
replacements to implement exactly the functions listed. Instead, we would 
like be able to change this type, either to add new functionality, such as to 
count packets, or to alter the types of existing functions. 

— The timing of updates should not be restricted by the system. In the IPX 
example, we could not unload the module while it was in use to replace it with 
a better version. In PLANet and other ANs, some components of the system 
may always be in use; for example, the packet queue may always contain 
some packets. In the queue plug-in code, we dealt with this situation by 
transferring all packets from the old queue to the new at installation time. 
We must allow for a similar mechanism when changes are unanticipated, 
which implies that when new code replaces the old, it must be able to take 
care of transferring any necessary state. 
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While a system that meets these requirements will be significantly better 
equipped to deal with network evolution, the additional flexibility leads to com- 
plications. In particular, it becomes more difficult to ensure that a dynamic 
extension is valid, and to secure that extension. 



Validity Returning to the example of PLANet services, say we wish to update 
the register_svc function to additionally include the PLAN type of the service: 

extern void register_svc (char *planSvcNajne , 

value_t * (*if cFun) (active_packet_t *p, 
value_t *args [] ) , 
plan_type_t *argTypes [] , 
plan_type_t returnType) ; 

We have added two additional arguments: argTypes, which describes the ex- 
pected types of the PLAN service function’s arguments, and returnType, which 
describes the type of the return value. 

This information will be stored in the service symbol table (along with the old 
information) to type-check the services called by a particular PLAN program. 
To do so, we have to alter the table’s format to include the new fields, which has 
two implications. First, before the new code that implements register_svc can 
be used, the data in the current symbol table will have to be converted to match 
the new type. The second implication is that we will have to change the other 
functions that directly access the table to be compatible with the new type. We 
cannot sidestep these issues, as we did in the IPX handler case, by waiting for 
the symbol table to become empty before making the change because it may 
never become empty. 

Now we have several new concerns: 

— What about old client code that calls register_svc? This code will still 
expect the same type as before. A quick answer would be that all the old 
code must be updated to use the new type. However, this is not feasible since 
other parties may have loaded code that calls register_svc, and we may 
not have access to that code. We therefore need some way to allow the old 
code to access the new implementation using the old interface. 

— When is a reasonable time to make the change? If the node is accessing 
the table, perhaps in another thread, when the transformation takes place, 
then changes to the table could be lost or made inconsistent. Thus, we need 
to time the transformation appropriately, perhaps with assistance from the 
application. 

To clarify this point, consider that the old version of register_svc is running 
in thread t\ and is just about to add a new entry to the table, when in 
another thread (2 the new version is loaded to replace it. We might naively 
think that, at the time of update, t2 could translate the state from the old 
representation to the new and store this in the new module, similar to what 
we did in the queue example. However, this translation may not correctly 
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include changes made to the state by thread ti. At best, all that will happen 
is that the change will not be reflected in the new program. At worst, e.g. 
if the translation begins just after starts to alter the table, the translated 
table will be inconsistent. Therefore, properly timing the changes to ensure 
that state is consistent is extremely important. 

These questions follow from the general question of what constitutes a valid 
change to a program. That is, when will an update leave a program in a “rea- 
sonable” state? Not surprisingly, Gupta has shown that determining the validity 
of a change is in general undecidable [7]. Therefore, we must rely on structuring 
our program so as to simplify the question. When we use plug-in extensibility we 
essentially limit the forms that changes may take, and can therefore more easily 
understand their effect on the system. We must similarly learn how to formulate 
sound methodologies, preferably with a formal basis, for ensuring validity when 
making the more sophisticated kinds of changes mentioned here. Because the 
methodology used depends on the kind of change, we do not want to impose a 
general set of restrictions. However, having some notion, whether enforced by 
the system or not, of what constitutes a valid change is critical to the practical 
use of the system. 

Security A topic related to validity is security. Assuming we can avoid integrity 
failures by using type-safe dynamic linking (in a language like Java, or Typed 
Assembly Language [12,16]), we must still worry because the greater a system’s 
flexibility, the greater the risk of problems. For example, in the current plug-in 
version of PLANet, there is no possibility of new code maliciously or inadver- 
tently preventing the operation of the packet processing loop since this code 
was not coded to expect possible change. However, when we add the ability to 
change any part of the system, as proposed above, this property is no longer 
guaranteed, constituting a significant threat to node security. A related problem 
is information security. That is, certain services may contain private information 
that should not be made available to other services. However, if complete access 
to those services is available to all new or updated code, then there can be no 
privacy. 

Both problems may be avoided via module thinning [3,10], a technique 
whereby new code may access old code commensurate with its level of privi- 
lege. For example, a routing table service in the node may allow anyone to read 
the table, but only certain individuals to write to it. This can be controlled 
by thinning the table- writing function from the environment of inappropriately- 
privileged code. 

In general, while the total space of threats to security increases with flexibil- 
ity, the need to deal with these threats is application-dependent. For example, 
the security of a personal computer operating system is probably less important 
than that of a generally-available active network node. 
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4.2 Achieving Dynamic Software Updating 

Given the evolutionary benefits of dynamic software updating over plug-in exten- 
sibility, how can we implement it and mitigate its additional problems of validity 
and security? In this subsection, we present some of the more recent work in the 
area and point to some promising approaches. In general, no existing system 
meets all of the requirements we have mentioned. We hope to draw from this 
work to arrive at a more comprehensive solution [8]. 



Erlang Erlang [5] is a dynamically-typed, concurrent, purely functional pro- 
gramming language designed for building long-running telecommunications sys- 
tems. It comes with language- level and library support for the dynamic update 
of program modules. If the old module is active when the update occurs, then 
it continues to be used until called from an external source. If any call to a 
procedure is fully-qualified (i.e. function iter in module M syntactically specifies 
its recursive call as M. iter () rather than simply iter ()) then the new version 
of the function is called, if it is available. Only two versions of code may be 
available in the system at any given time; the current old version of code must 
be explicitly deleted (if any exists) before new code may be loaded, and certain 
library routines may be used to detect if the old code is still in use. 

In Erlang, we could code our system in a straightforward manner but still 
replace its components at runtime. However, Erlang does not provide any auto- 
mated support for ensuring validity or security — the programmer must ensure 
reasonable timing and shape of updates. On the other hand, Erlang has language 
features that make this process more straightforward; 1) all data is write-once 
(no mutation), and 2) all thread-communication occurs via message passing. 
This effectively means that only one thread will ever “change” long-lived data 
(by passing a modified copy to its recursive call) , and all other threads may only 
access this data in some distilled form via message passing. In this way, essen- 
tially all function calls to other modules are stateless: the state carried around 
by a thread is in its argument list, and the only way to get at state managed by 
another thread is to pass it a message and receive its response (which is difl'erent 
than a function call). 

In general, we believe that the Erlang model of dynamic software updating 
is a good step towards facilitating evolution: it is simple and yet very flexible. In 
future work [8], we plan to generalize the updating notions in Erlang to less re- 
stricted environments (i.e., ones that allow mutation), to add further automated 
support (i.e. load-time type-checking), and to better formalize the programming 
patterns necessary to preserve correctness. We have begun to implement this 
sort of model in Typed Assembly Language [12]. 



Dynamic C-| — h Classes Hjalmtysson and Gray have designed and imple- 
mented mechanisms for the dynamic update of classes in G-f-l- [13]. Their im- 
plementation requires the programmer to specially code classes that may be 
dynamically replaced using a proxy class Dynamic. Dynamic allows objects of 
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multiple versions of a dynamic class to coexist: it maintains a pointer to the 
most recent version of a class, directing constructor calls to that class, while 
instance methods are executed by the class that actually created the object. 

This project demonstrates the appeal of an object-oriented approach to dy- 
namic software updating: by using instance methods, an instance’s operations 
are consistent throughout its lifetime, even if a newer version of its class is loaded 
later. However, determining which set of static methods to use at runtime may 
be difficult, so the system prevents their replacement. This may be overly re- 
strictive, as all conceptually global data must be anticipated at deployment. 

The chief drawback of this approach for our purposes is the lack of safety of 
the C-I--I- language. While the authors state that the loading of new classes pre- 
serves type safety if it exists, C-|— l-’s lack of strong typing makes it inappropriate 
for loading untrusted code. 



PODUS PODUS [20] (Procedure-Oriented Dynamic Update System), devel- 
oped by Mark Segal and Ophir Frieder, provides for the incremental update of 
procedures in a running program. Multiple versions of a procedure may coexist, 
and updates are automatically delayed until they are syntactically and semanti- 
cally sound (as determined by the compiler and programmer, respectively). This 
is in contrast to Erlang and Dynamic C-| — I- classes, which allow updates to occur 
at any time. 

Updates are only permitted for non-active procedures. Syntactically active 
procedures are those that are on the runtime stack, and/or may be called by the 
new version of a procedure to be updated. Semantically related procedures are 
defined by the programmer as having some non-syntactic interdependency. Thus, 
if a procedure A is currently active, and is semantically related to procedure B, 
then B is considered semantically active. 

Updates to procedures are allowed to change type, as long as special inter- 
procedures are provided to mediate access; interprocedures are stubs that have 
the old type, perform some translation, and then call the new function at its 
new type. This is especially useful in AX, since code originates from different 
sources. Library functions may be updated to new interfaces even though their 
client code may not be available for change. 

5 Conclusions 

Active network research to date has made great strides in defining a customizable 
network architecture; most projects add non-trivial flexibility to the use and 
administration of the network. However, no existing system truly solves the 
problem of slow, long-term network evolution, because the form of future updates 
is too restricted. In particular, most systems use plug-in extensibility as their 
means of loading code. In this paper, we have identified some of the shortcomings 
of plug-in extensibility with regard to system evolution, and have proposed a way 
to ease those restrictions in the form of dynamic software updating. 
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While the topic is quite old, research into dynamic software updating is really 
in its early stages, and more experience is needed. Because of its applicability to 
many areas ontside of active networks, we hope that more advances will be made 
in the coming years, to allow system engineers to construct systems simply that 
are nonetheless updateable. Work is especially needed to ensure that updates are 
applied to these systems in a safe and secure manner. We feel that this is one 
of the most important problems facing the active networking community today 
and plan to vigorously pursue it in future work [8]. 
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Abstract. This paper proposes an Active Networks architecture for 
VoIP Gate Way. In the proposed architecture, instead of procedural 
language, declarative language is used to describe the up-loaded 
program. This allows for a reduction in size of the up-loaded program, 
and increases the flexibility for describing the up-loaded program. 
Active Networks can then provide highly flexible services. An 
experimental system was implemented for feasibility studies. 
Specification of the declarative language used for describing the up- 
loaded program, basic functionalities of an interpreter for the language, 
and execution control program which executes components programs 
stored the node beforehand were confirmed. 



1 Introduction 

This paper proposes an Active Networks architecture for VoIP Gate Way using 
declarative language for describing uploaded program by users. 

The following two issues are important issues for implementing Active 
Networks[l]. 

i) reduction in size of the program up-loaded to the node 

ii) prevention of reduced reliability caused by up-loaded program 

In the proposed architecture for Active Networks, instead of procedural language, 
declarative language is used to describe the up-loaded program. This allows for a 
reduction in size of the up-loaded program, and increases the flexibility for describing 
the up-loaded program. Active Networks can then provide highly flexibile services. 

In Active Networks, unspecified number of users send their programs to the nodes. 
This may cause illegal system controls or feature interactions between users' up- 
loaded programs. Therefore, some mechanisms for restricting illegal system controls 
and for automatically detecting feature interactions are required to preserve the 
system reliability. 

An experimental system was implemented for feasibility studies. Specification of 
the declarative language used for describing the up-loaded program, basic 
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functionalities of an interpreter for the language, and execution control program 
which executes components programs stored the node beforehand were confirmed. 

In section 2, some characteristics of the proposed architecture are described. In 
section 3 software architecture is described. After the declarative language newly 
developed for describing up-loaded programs is explained, the interpreter for the 
language and the execution control program are discussed. In section 4, after 
introducing the experimental system, experimental results are discussed. In section 5, 
some issues of actual use are discussed. 



2 Characteristics for the Proposed Architecture 

Active Networks have just shifted to the experimental phase from the concept phase 
and many ways of how to construct the Active Networks have been proposed[2]. This 
paper proposes new architecture for the Active Networks for VoIP Gate Way where 
the up-loaded program is described using declarative language, instead of procedural 
language with automatic validation functionality (Figure 1). The objectives of the 
architecture are to obtain flexibility for users to describe programs, to reduce the size 
of up-loaded program and to ensure system reliability. These objectives generally 
contradict each other. 




Considering that, as is well known, specifications of telecommunication services 
can be described as a state transition diagram, declarative language which describe the 
conditions for state transitions and system control separately are developed for users 
to describe up-loaded programs. As far as the authors know, such architecture has 
never been proposed. More precisely, we added the ability of describing system 
control conditions to STR (State Transition Rule)[3] which is a declarative language 
developed by ATR (Advanced Telecommunications Research Institutes) for 
describing telecommunication service specifications. The enhanced language is called 
ESTR (Enhanced STR). 

The reasons why STR is considered, are as follows: 
a) Users, who are not necessarily expert for the network, can describe up-loaded 
programs. Moreover, the precise specifications of the program in the nodes are 
generally unknown. Therefore, it is better for users, who describe up-loaded 
programs, to describe conditions needed only for them. But, in the case of 
procedural language, it is impossible for users to describe up-loaded programs 
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unless they know precisely the program in the node to which they up-load their 
programs. On the other hand, in case of STR, it is possible for users to describe 
their programs without knowing the program in the node, 
b) As unspecified numbers of users send their programs to the nodes, feature 
interactions between their programs may occur. Therefore, the language for users 
describing their programs should be used for detecting feature interactions. 

STR has been greatly researched with these conditions and some concrete methods 
have been proposed[4][5][6][7]. Then, we decided to enhance STR specifications in 
order to describe conditions for system controls (e.g. sending and receiving signals). 

ESTR has the form of Pre-condition, event and Post-condition. In the Post- 
condition, change of state and system controls (e.g. sending/receiving signals and 
connection/disconnection of the path) are described accordingly. By making it 
possible to describe conditions for state transition freely, flexibility of services are 
preserved. As for describing system controls, by restricting description freedom to the 
point where the user can use only program components provided beforehand by the 
service provider, illegal system controls are avoided. 



Service Program 



Execution Environment 



Platform provided by 
a vendor 



Fig. 2. Program Construction in the Node 

As some parts of the programs described using STR can be reused automatically 
for other services, the size of the up-loaded programs can be reduced. On the other 
hand, as the conditions for state transition can be described freely, the range of service 
description becomes much wider compared to AIN (Advanced Intelligent Network). 
As the descriptions of the conditions for system controls are interpreted by a system 
execution control program in an execution environment part and are restricted to 
initiate only the responsible programs stored in the node beforehand by the service 
provider, system reliability is preserved. 

As for conditions for state transition, by preparing an automatic detection method 
for inconsistency with other programs, the degradation in network reliability is 
prevented. Nowadays this problem is called feature interactions and is recognized as 
one of the possible bottle necks for software development. Much research about 
feature interactions has been done all over the world. STR has characteristics to detect 
feature interactions. 

A validation server, as shown in Figure 1, detect feature interactions based on 
algorithms proposed in the past research [4][5]. The up-loaded programs are validated 
at the validation server, and interaction detection programs are executed. If no 
interactions are detected they are sent to the appointed node, and are interpreted by an 
interpreter in the node. 
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3 Program Structure 

(1) ESTR 

As described in section 2, ESTR has been enhanced based on STR as a programming 
language for providing network services. Precisely, as a condition for rule application, 
conditions for state transitions are applied in the same way as STR, and a description 
part of conditions for system controls required for the state transition is added. 

ESTR has the form of Pre-condition, event and Post-condition. It is a rule to define 
a condition for state transition, state change while the rule is applied, and system 
control required for the state transition. Pre-condition consists of status description 
elements called primitives. Primitives are statuses of terminals or relationships 
between terminals which are targets of the state transition. An event is a trigger which 
causes the state transition, e.g. a signal input to the node and some trigger occurs in 
the node. Post-condition consists of two parts. One is the state description part which 
also consists of primitives. The other is the system control description part which 
shows the system controls required for the state transition. The system control 
description part is described in {} which follows after state description part separated 
by ', '(see Figure 3). When no system controls are required, the content of {} is empty. 
A description example of ESTR is shown in Figure 3. 

call(x,y) connotify(y,x): talk(x,y),{Send(con,y,x),Con(x,y)} 

Fig. 3. An Example of ESTR 

The example in Figure 3 is explained. Terminal x and y are in calling state, 
denoted by call(x,y). If terminal y makes offhook, denoted by connotify(y,x), a signal 
Connect is sent to terminal x, denoted by Send(con,y,x), and terminal x and y transit 
to talk state, denoted by talk(x,y). call(x,y) and talk(x,y) are called status primitives. 
All arguments in status primitives are described as variables so that a rule can be 
applied to any terminals. 

Current state of System Next state of System 

Delete a state corresponding to the Add a state corresponding to the 
Pre-condition of the applied rule Post-condition of the applied rule 

Fig. 4. System State Change by Rule Application 

When an event occurs, a rule which has the same event and whose Pre-condition is 
included in the system state is applied. When the rule is applied, stored programs 
designated by the system control description part are executed. When the programs 
end normally, the system state changes as follows. A state corresponding to the Pre- 
condition of the applied rule is deleted from the current system state and a state 
corresponding to Post-condition of the applied rule is added (Figure 4). Here, a state 
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corresponding Pre/Post-condition is obtained by replacing arguments in Pre/Post- 
condition with actual terminals when the rule is applied. 

(2) Interpreter 

Functions of an interpreter to select and execute a rule mentioned in (1) are briefly 
explained (Figure 5). 




Fig. 5. Interpreter for ESTR 



When a packet is received it is converted to an event in the Input Process part in 
the execution environment program, and the interpreter is initiated by being sent the 
event. The interpreter selects a rule which has the same event sent. All terminal 
variables as arguments of a Pre-condition of the rule are replaced by actual terminals 
based on the actual terminals as arguments of the event sent. If some arguments of the 
Pre-condition remain unreplaced, they are replaced as follows. Check if a primitive 
which is the same as the primitive of the Pre-condition whose arguments remain 
unreplaced exists in the system state kept in the interpreter. If the same primitive 
exists in the system state, the arguments of the Pre-condition are replaced by the 
arguments of the primitive in the system state. 

When all arguments of the Pre-condition are replaced by actual terminals, the Pre- 
condition is checked to see if all the primitives of the Pre-condition exist in the system 
state. If they exist, the rule is selected as an applicable rule. If not, another rule which 
has the same event is searched for and checked in the same way as described above. 

When the rule to he applied is selected, the system control description part in a 
Post-condition of the rule is sent to the system control executing part in the execution 
environment program shown in Figure 6. If execution in the system control executing 
part ends normally, the system state is changed as follows. A state corresponding to 
the Pre-condition of the applied rule is deleted from the system state, and a state 
corresponding to the Post-condition of the applied rule is added. If the execution in 
the system control executing part ends abnormally, the system state is not changed. 

(3) System Control Executing Part 

A process in the system control executing part is explained briefly. 

On receiving a system control description part of a Post-condition from the 
interpreter, the system control execution program analyzes it and decides which 
programs to he executed and their execution order. The programs have been stored in 
the system beforehand by the provider. The system control execution program returns 
execution results to the interpreter. 
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4 Experimental System 

4.1 Basic Service 




Fig. 6. Experimental System 



The software structure of an experimental system for the proposed system is shown 
in Figure 6, An execution environment program, which consists of an Input 
processing part, an ESTR Interpreter and a System control executing part, is 
implemented on top of an IP Gateway Platform purchased on the market. A user 
program described using ESTR is executed on the Execution environment program. 
The protocol between terminals and the IP Gateway is ISDN. The system control 
description part shows conditions for controlling the IP Gateway based on protocol 
H.323 (Figure 7)[8]. 

(1) ESTR Interpreter 

The ESTR Interpreter is initiated by receiving an event from the Input processing 
part, it selects a rule in a rule data base, and interprets the rule. The interpreter sends 
the system control description part of the rule to a system control executing part. 
When the interpreter receive execution results from the system control executing part, 
a state corresponding to the Pre-condition of the rule is deleted from the system state, 
and a state corresponding to the Post-condition of the rule is added to the system state. 

(2) Input processing part 

When the Input processing part receives a signal from the Platform, the Input 
processing part translates it to an event defined in ESTR Interpreter, and the event is 
sent to the Interpreter to initiate the Interpreter. 

(3) System control executing part 

When the System control executing part receives a signal (system control description 
part of the rule) from the Interpreter, it analyzes it and calls the appropriate API 
provided by the Platform to send signals to terminals or other nodes. 

(4) Basic service 

The ESTR description of a normal route in basic service of VoIP Gateway is shown in 
Figure 8. In Figure 8, wtalert(x,y) represents that terminal x is awaiting 'alerf signal 
from terminal y. called(x,y) represents that terminal y is called by terminal x. call(x,y) 
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represents that terminal x is calling terminal y. talk(x,y) represents that users of 
terminal x and terminal y are talking with each other, wtrelcomp(x) represents that 
terminal x is awaiting 'release completion' signal from the network, wtrel(y) 
represents that terminal y is awaiting 'release' signal from the network. In the same 
rule, the same terminal variables represent the same terminal. Between different rules, 
the same terminal variables, x in rule 1 and x in rule 2, are not necessarily the same 
terminal. On the other hand, different terminal variables in the same rule represent 
different terminals. But, between different rules, different terminal variables, x in rule 
1 and y in rule 2, are not necessarily different terminals. 
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Fig. 7. H.323 Protocol 





setup(x,y), alert(y,x), disc(x,y), rel(x) and relcomp(x) are events. setup(x,y) 
represents receiving a 'setup' signal from terminal x to terminal y. alert(y,x) represents 
receiving an 'alert' signal from terminal y to terminal x. disc(x,y) represents receiving 
a 'disconnect' signal from terminal x to terminal y. rel(x) represents receiving a 
'release' signal from terminal x. relcomp(x) represents receiving a 'release complete' 
signal from terminal x. 

Send(s,x,y) represents sending terminal y a signal 's' from terminal x. Con(x,y) 
represents connecting terminal x and terminal y. Disc(x,y) represents releasing a 
connection between terminal x and terminal y. 

idle(x) setup(x,y): wtalert(x,y), {Send(calp,x),Send(setupnotify,x,y)} 

idle(y) setupnotify(x,y): wtalert(y,x),{Send(setup,x,y)} 

wtalert(y,x) alert(y,x): called(y,x),{Send(alertnotify,y,x)} 

wtalert(x,y) alertnotify(y,x): call(x,y),{Send(alert,y,x)} 

called(y,x) con(y,x): talk(y,x),{Send(connotify,y,x),Send(conack,y,x)} 

call(x,y) connotify(y,x): talk(x,y),{Send(con,y,x),Con(x,y)} 

talk(x,y) disc(x,y): wtrelcomp(x), {Disc(x,y),Send(rel,x),Send(discnotify,x,y)} 

talk(y,x) discnotify(x,y): wtrel(y),{Send(disc,x,y)} 
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wtrel(x) rel(x): idle(x),{Send(relcomp,x)} 
wtrelcomp(x) relcomp(x): idle(x),{} 

Fig. 8. Examples of ESTR Description 

A signal flow from a receiving setup signal to a sending setup signal to a 
Gatekeeper is shown in Figure 9. 




Fig. 9. An Example of a Signal Flow 



4.2 Adding New Service 

In the experimental system, after confirming basic service functionality, we added 
originating call screening service and terminating call screening service to basic 
service functionality on the assumption as follows: 

- ESTR rules for basic service exists in the node. 

- Program components for system control have been provided in the node. 

- Interpreter should not be changed for new services. 

(1) Originating Call Screening Service 

For originating call screening service, if the setup signal is sent to a terminal which 
has been registered in the screening list as a screened terminal, the call connection is 
rejected (Figure 10). Not only the directory number for an individual terminal but also 
the special code, in order to inhibit distant call, can be registered in the screening list. 
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To add the originating call screening service, for a normal route, the following 
ESTR rules are needed: 

idle(x),m-ocs(x,y) setup(x,y): wtrel(x),m-ocs(x,y), {Send(disc,y,x)} 
wtrel(x),m-ocs(x,y) rel(x): idle(x),m-ocs(x,y), (Send(relcomp,x)} 

But, the second condition is not for only originating call screening service. This 
condition is the same as basic service. Therefore, the primitive m-ocs(x,y) in Pre- 
condition and Post-condition can be deleted, resulting as follows. 
wtrel(x) rel(x): idle(x),{Send(relcomp,x)} 

This rule is the same rule for basic service shown in Figure 8. Thus, the second rule 
does not need to be added. Consequently, only one rule has to be added to add 
originating call screening service to basic service. 

(2) Terminating Call Screening Service 

For terminating call screening service, if a call terminated from the terminal which 
has been registered in the screening list as a screened terminal, the call connection is 
rejected (Figure 11). Both directory number for individual terminals and area codes 
for rejecting calls from certain areas, can be registered in the screening list. 

To add the terminating call screening service, for normal route, the following 
ESTR rules are needed. 

m-tcs(y,x),idle(y) setupnotify(x,y): m-tcs(y,x),idle(y), {Send(discnotify,y,x)} 
m-tcs(y,x),wtalert(x,y) discnotify(y,x): m-tcs(y,x),wtrel(x), {Send(disc,y,x)} 
m-tcs(y,x),wtrel(x), rel(x): m-tcs(y,x),idle(x), {Send(relcomp,x)} 

But, the second and third conditions are not only for the terminating call screening 
service. These conditions are the same as for the basic service. Therefore, the 
primitive m-tcs(y,x) in Pre-condition and Post-condition can be deleted; the result is 
as follows. 

wtalert(x,y) discnotify(y,x): wtrel(x),{Send(disc,y,x)} 
wtrel(x) rel(x): idle(x),{Send(relcomp,x)} 

The second rule is the same rule for basic service shown in Figure 8. The first rule 
is also the same rule for the semi-normal route in the basic service. Therefore, these 
rules do not need to be added. Consequently, only one rule has to be added to add a 
terminating call screening service to the basic service. 




Fig. 11. Terminating Call Screening Service 
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4.3 Points Confirmed 

In implementing an experimental system, the following points were confirmed. 

- For ESTR description: basic service, originating call screening service, 
terminating call screening service and other services, such as call forwarding 
service, can be described without any problems. 

- For Interpreter: it works correctly in relating to all system states for basie service, 
originating call screening service, terminating call screening service and call 
forwarding service. 

- For system control executing program: it can call appropriate APIs of the 
Platform for the four services mentioned above. 



5 Evaluation and Future Work 

5.1 Evaluation 

Dr. Yamamoto proposed the following three standpoints for evaluating Active 
Networks. 

a) inevitability: It is inevitable to provide a service as a network service, or it is 
more effective to provide it as a network service compared to providing it as 
an End-to-End service. 

b) light load: It does not give much overhead to router processing, memory 
amount and processing time amount. 

c) locally working: It is impractical to suppose that all routers in the network are 
implemented as Active Networks. Therefore, the service needs to be provided 
under the condition that not all routers have Active Network architecture. 

These three standpoints are adopted for evaluation of the experimental system. 

a) inevitability: Considering the functionalities of VoIP gateway service, it is 
inevitable that the service be provided as a network service. 

b) light load: Though detailed evaluation is for future work, for the four services 
described, there do not seem to be major problems. 

c) locally working: Services related to either the originating terminal or the 
terminating terminal, can be provided under the condition that only the node 
that accommodates the terminal subscribed to the service has proposed Active 
Networks architecture. If the service is related to both terminals; originating 
and terminating terminals, the service can be provided under the condition that 
only the nodes that accommodate the originating terminal or the terminating 
terminal have the proposed Active Networks architecture. Consequently, in 
any case, the service can be provided. 

Next, the range of services provided is evaluated. Though AIN (Advanced 
Intelligent Network architecture) is said to be able to customize services according to 
customer's requirements, it can not provide new services which customers require. 
AIN can customize only a part of service provided network providers. On the other 
hand, in the proposed architecture, customers, or users, can describe any services 
under the range of system control functionalities provided beforehand by vendors. 
Thus, the proposed architecture has a lot of flexibility for adding services, compared 
to AIN. 
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5.2 Future Work 

Some problems for implementing commercial systems are described. 

(1) Guidance for primitives in ESTR 

As up-loaded programs are described freely using ESTR by unspecified users, the 
following problems occur in relation to primitives. Different primitives are used even 
for the same state description element. Or, the same primitive is used even for 
different state description elements. Granularity of primitives may differ according to 
users. Guidance for users in describing primitives is required. 

(2) System state description in an Interpreter 

The Interpreter records system states to select rules to be applied. But, where many 
terminals are accommodated to the nodes, it is unrealistic to describe all primitives as 
one state from the viewpoint of searching primitives in the system state which are the 
same as those of the rule. A method where only necessary primitives in a system state 
can be chosen for selecting a rule is required. 

(3) API provided vendors 

Size of functionality described in system control description part in ESTR may not 
coincide with the functionality of API provided by a vendor, large or small. Or in 
some case, the size of functionality described in the system control description part is 
not the sum of the functionalities given as some APIs. Moreover, considering a 
multiple vendor environment, APIs should be standard. Otherwise, Input processing 
programs and system control executing programs should be prepared for every 
vendor. 

(4) Validation server 

In experimental system, the validation server was not implemented. Feature 
interaction validation methods have been discussed world-widely[9], many methods 
have been proposed. But, validating time and validation accuracy are trade-offs. 
Therefore, an effective and accurate validation method is required. Authors have 
proposed some methods based on STR [4]. Now, we are investigating to apply our 
research results on STR to ESTR. 



6 Summary 

To achieve flexibility for service description, reduction in size of programs up-loaded 
by users and prevention of reduction in system reliability by up-loaded user programs. 
Active Networks using declarative language ESTR was proposed. Though there are 
some problems to be solved for implementation of a commercial system, the proposed 
method has the potential to reduce the size of user programs up-loaded to the nodes, 
to describe new services freely and to detect feature interactions between user 
programs. For future work, the four problems mentioned in section 5.2 have been 
studied on a continuous basis. 
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Abstract. Key requirements for Active Networks are their ability to 
integrate heterogeneous components; to rely on widely accepted 
programming models, to support availability in case of failure, and to 
achieve good performance. However, those requirements are 
conflicting - e.g. using Java based programming systems to provide 
heterogeneity, typically results in poor performance. We address this 
problem by introducing a three-level architecture that a) makes a minor 
change in the conventional programming language model - introduces 
the remote storage class to support code distribution and its parallel 
execution, b) uses a separate user interface to establish active sessions, 
to specify the topology for a given distributed computation, and to 
replicate the sessions, and c) achieves efficient run-time distribution 
through a new mechanism, called active network calls (ANC), that 
performs computations through low cost asynchronous operations and 
active capsules. Data flow and operation flow are separated thus 
allowing a high degree of parallelism. 



1 Introduction 



How to add programmability to the network infrastructure has become the central 
question in a growing number of studies in recent years. A successful answer holds 
the promise of increasing the flexibility of existing networks and speeding up network 
innovation by allowing an easier design and implementation as well as a faster 
deployment of new services. Active networks and mobile agents are among the most 
radical approaches for achieving this goal. The central idea is to interpose 
computations within the network, thus enabling a new generation of networks that 
provide the programmer with the power to develop distributed network services and 
applications. Experiments show that a range of network services, in particular Internet 
services, benefit from this new approach. Examples include improved flexibility and 
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security of network management through BBN's Smartpackets[l], better operability 
through mobile private networking protocols offered by MIT's ANTS[2], distribution 
of the computations along the delivery path of a message using the University of 
Pennsylvania's Switch Ware[3], dynamic firewall creation and routing configuration 
using the Columbia University's NetScript[4].). Criteria for evaluating the benefits 
from these systems typically include: 

1 . Efficiency - the ability to deliver enhanced performance at an acceptable price for 
the active technologies; 

2. Availability - the ability to survive in case of errors and crashes, and to deliver the 
requested service; 

3. Interoperability - the ability to execute in a network that is heterogeneous with 
respect to computing nodes as well as operating systems (a characteristic of today's 
Internet); 

4. Generality - in that the constructs of the programming environment are sufficient 
for a broad range of applications; 

5. Usability - in that the effort to learn and operate the system is limited; this typically 
requires that the environment be based on widely accepted programming languages 
avoiding machine or problem specific languages and models. 

These criteria impose conflicting requirements on the network design, e.g. using 
Java ensures interoperability, generality and usability, but results in poor performance. 
Similarly, replication of the programs to ensure availability requires special language 
constructs and therefore results in poor generality and usability. 

We propose a mechanism called Active Network Call (ANC), to move code over a 
network and process it on a set of remote computing nodes, which solves the above 
conflicting requirements separately, on three hierarchical layers. 

The first layer of ANC provides a simple mechanism for very efficient code 
invocation using the active node concept. To do this, we use unified active capsules 
with a direct addressing scheme for asynchronous remote data object management, 
program invocation, and synchronization. We optimized the code so that marshalling 
and demarshalling is performed asynchronously with respect to the procedure 
invocation itself Therefore it becomes considerably more efficient as the known 
middleware systems, or Java-based mobile agent or active network systems. 

The second layer of ANC provides an easy mechanism to spread (deploy) multiple 
computations over the network and to create a specific topology (e.g. farm, ring, 
pipeline), called here computing session. A computing session can be replicated to 
ensure higher availability, and performance through competition between them. An 
example is to use replicated computing sessions for parallel and reliable computations 
on a pool of workstations. 

Additionally, at this layer we specify the distribution and the replication of the 
code, in a way independent from the coding of a distributed program (made in the 
third layer). The corresponding programming environment to distribute the 
computations is called the late binder. It provides an interactive interface to create 
replicated sessions, to specily the nodes on which the code should be executed, 
eventually to compile the source code on those nodes, and to start the local and the 
remote programs. As a result, the local program interacts with multiple remote 
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programs asynchronously using unified active capsules and squeezes as much 
parallelism as possible. The management of the replicated sessions is provided by the 
so-called replication proxy thread, which remains invisible for both, the local and the 
remote programs. The late binder is implemented in three versions to ensure 
interoperability, using JINI, for LAN environment (like a pool of workstations), using 
mobile agents MOLE[5] for WAN environment, and using ftp and telnet protocols for 
individual remote computers, which do not support JINI or MOLE. 

The third layer of ANC provides generality and usability through the well- 
understood procedure call semantic of the programmer’s interfaee (currently in terms 
of C, C++), which hides the asynchronous nature of ANC and therefore requires no 
explicit synchronization. Additionally, ANC hides the replication of the computations, 
allowing parallel interaction with several remote programs, and one-to-many 
interaction with replicated sessions. Thus, during the coding of the distributed 
program, we define only the data objects to be 'remote' and not the procedures itself 
like in RPC. A procedure becomes implicitly 'remote', only if it operates on a set of 
data objects belonging to the same remote address space (i.e. computer). The 
programmer can use the late binder at layer two to map this set of data objects to 
several computers at the same time, and thus to specify certain replication of the 
remote procedures, or computing sessions. 

The rest of this paper is organized as follows. The second section describes the 
concept of remote data objects and the way of coding of distributed programs using 
ANC. We also describe implementation of this ANC interface using the ANC call 
generator ancgen. The third section deals with the Late Binder to create computing 
sessions, to replicate them, to install and start the active applications on remote nodes. 
The fourth section deals with the implementation of the ANC applications using the 
active capsules and active nodes, and we consider especially our approach to optimize 
the code. Lastly, the fifth section describes how a distributed application used the 
developed portions of the ANC architecture to perform a calculation. This section 
also includes a comparison of the performance of ANC versus RPC in the application. 



2 Creating Distributed Applications with ANC 

One of the main problems of creating distributed programs is the offered programming 
abstraction. The programmers of conventional programs prefer the consecutive 
execution of instructions, or blocking procedure calls, provided for example by RPC 
(remote procedure call). At the same time in the real world (like Internet) we deal with 
non-deterministic network delays and partial failures, so that blocking communication 
is impractical. However, implementing asynchronous non-blocking calls requires, 
additional language constructs for synchronization between the client and server 
execution, provided for example by the languages for parallel programming. Our goal 
is to provide a mechanism for easily deploying of asynchronous computations 
(procedures, programs) over the network, and to provide a simple mechanism for 
synchronization of the results without any speeialized language constructs. 
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2.1 Remote Data Objects 

To provide this, we use as central data abstraction the remote data object introduced 
in CDC[6]. In programs written in C, we define the remote data to belong to the 
storage class remote (similar to static and extern). From the programmer's point of 
view a remote data object belongs to a remote address space, i.e. to a remote- 
computing node. No more language constructs are required. The decision which 
procedure should be executed remotely, and how to synchronize the results, is done 
implicitly and automatically. 

Our approach could be explained based on an example in four steps: 

Step 1. Remote objects definition. We define the structure request, and therefore 
all data objects inside it to belong to the remote address space; the variable result 
i s remote too: 

remote struct { 
double pO; 
double qO; 

} request ; 

remote double result ; 

double al, a2, bl, b2; 

During the program execution the remote data objects request and result 
will be created in the remote address space. The values of the hashed variable names 
will be used to access the remote data objects. 

An operation or a procedure on remote data objects will be executed remotely (in 
the remote address space) only if all its arguments are from class 'remote'. Again, no 
explicit language construct is required to mark a procedure or an operation as remote 
one as shown in the next three steps. 

Step 2: Initialization of the remote objects. We initialize the remote objects by: 

request. pO = al; 

request. qO = a2; 

This operation will be executed locally because al and a2 are not 'remote'. During 
the program execution the values of al and a 2 will be written to the remote variables 
pO and qO. We use XDR (RFC 1832) for marshalling and demarshalling of the 
arguments to the remote address space. This quite costly operation will be done 
as 5 mchronously before a remote operation on those objects is started. 

Step 3: Execution of the remote program. 

At the same time the procedure fractal will be executed remotely because all 
variables it uses are remote. 

result = fractal ( request . pO , request . qO ) ; 

This 'procedure call' is implemented by active capsules, similarly to ANTS[7]. The 
active capsule caries only a pointer to the procedure, which is directly used to call the 
procedure. No arguments are marshaled/demarshaled, no intermediate daemons will 
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be contacted, and no procedure names will be mapped. A copy of the result will be 
returned to a result thread at the client side, which waits to be read during the next 
synchronization step. This thread works as an active cache for remote results. 

Step 4: Synchronization of the results. 

Similar the procedure setPixel will be executed locally because bl, and b2 are 
local variables: 

setPixel (bl,b2 result); 

The execution of setPixel will read the result from the result thread, and say 
display it on the screen. The result thread ensures the effective retrieval of the result. 
Note that the remote procedure fractal will be executed in parallel to the local 
program until the synchronization is made. Properly speaking, the program 
optimization would include the invocation of the remote program as early as possible, 
and synchronization of the results as late as possible. This could be done by the ANC 
pre-compiler. 



2.2 ANC Pre-compiler 

During the program execution, some of the ANC system procedures will be called, 
which could be explained in terms of the four steps from previous chapter (fig.l): 

1. The execution of step 1 (Remote object definition) involves the ANC_Create ( ) 
call, which will create the correspondent data structures in the remote address 
space. 

2. The execution of step 2 (Initialization of the remote objects) involves the 
ANC Write ( ) call, which will write the initializing values into the remote data 
structure, using XDR for marshalling and demarshalling. 

3. The execution of step 3 (Execution of the remote program) involves the 
ANC_call ( ) , which will invoke the remote procedure fractal by passing only 
a pointer to this procedure, and names of the arguments to be used. A copy of the 
result will be sent to the result thread. 

4. The execution of step 4 (S 5 mchronization of the results) involves the 
ANCread ( ) , which takes the result from the result thread. 

The ANC pre-compiler analyzes the ANC program, coded in the way described in 
the previous chapter, inserts into the source code of the client and the server the 
specific ANC calls (ANC_create ( ) , ANC_write(), ANC_call(), and 
ANC_read ( ) ), and creates several files used by the late binder to install and run the 
distributed applications. The following passes are involved: 

1. The ANC pre-compiler analyzes the remote data objects in program and creates an 
interface file program_x . x to be used by XDR to transfer the data structures to 
the remote address space. Then it runs rpcgen to create the correspondent XDR 
stubs. 





50 Rumen Stainov and Joseph Dumont 



L ocal Clien t 

1 UNIX I 



Remote Server 




1. Defining the remote objects 



remote struct { 
double pO; 
double qO ; 
}request; 

remote double result 



I Do something else.. | 

2. Initializing the remote objects 



request. pO = 
request . qO 



I Do something else.. | 

3. Calling a procedure with remote 
objects 



result= 

fractal (request .pO, request . 



I Do something else.. | 

4. Using the remotely computed 
object r-request/ 

setPixel ( , result) ; ^ 




REMOTE ADDRESS SPACE 

ANC Active Messages Handler (AMH) 
ANC Remote Object Dispatcher (ROD) 
ANC Code Library (CL) 



ROD creates the remote objects 
request ( with members pO, qO), 
and result 



ROD initializes the remote object 
request 



AMH uses the message handle to 
select the objects in ROD and run 
the code from the CL 



ROD returns remotely computed 
object 



Fig. 1. The ANC calls during the execution of the distributed program 

2. It runs the ANC call generator ancgen, which generates a file with the specific 
ANC calls (ANC write ( ) , ANC read ( ) , etc.). Those ANC calls are tailored to 
the data structures and arguments to be passed. 

3. It removes the keyword „remote“ from the source file, and inserts the required 
ANC calls (ANC_create ( ) , ANC_write ( ) , ANC_call ( ) , and 
ANC read ( ) ) into the client’s code. 

4. It creates client directories with the Makefiles, source codes, and executables to be 
used by the late binder to move the code, to create to the computing topology, 
eventually to replicate it, and at the end to install and to run it. 



3 Creating Computing Sessions with the Late Binder 

The compilation of distributed programs with the ANC pre-compiler generates the 
client programs and libraries, and specifies the programs to be executed in a remote 
address space. However, it does not specify the computing nodes at which the remote 
address spaces are located, how to compile the remote programs on different 
operations systems, and how to replicate them to achieve high performance and 
availability. A separate programming module and user interface allow the programmer 
interactively: 

1 . to map the remote address spaces to physical computing nodes; 

2. to create a computing session representing a topology, like farm, chain, or ring; 
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3. to perform the compilation of the remote programs depending on the type of the 
operating system and the hardware of the computing nodes; 

4. to initialize replication of the computing sessions; 

5. to specify the underlying transport protocols (UDP, or TCP). When using 
replicated computing sessions, we recommend UDP in order to reduce the 
communication overhead, and to use UDP broadcast; 

6. to monitor the status of the execution of the remote programs and eventually to 
provides garbage collection. 





Fig. 2. Example of consecutive execution of f, f 2 , and f 3 (farm topology) and late binding of 
the 'remote' objects a, b, c, d, and f to the computer hostl, host2, and host3 

We refer to this process as late binding of the virtual active network to a physical 
network. 



3.1 Assigning Address Spaces to Computing Nodes 

In the example shown in figure 2 the remote data objects a, b, c, d, and f are 
assigned through late binding to nodes host], host 2 , and host 3 , which implicitly causes 
the procedures fi, f 2 , and f 3 to be executed remotely. 

In summary the new object type 'remote' defines which objects and procedures should 
be executed in a remote address space, and the late binding defines where this remote 
address space is located. 

In a special case, the remote address space can be identical to the local address 
space of the client, thus resulting in a centralized (non-distributed) execution (fig. 3). 
ANC introduces a minimal overhead for context switching and inter-process 
communication. 

Furthermore, the combination of consecutive and nested calls on remote objects 
allows defining different topologies for the distributed execution farm, chain, and ring 
(see fig. 4). 
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Fig. 3. Example of late binding of the 'remote' objects a, b, c, d, and f to the local computer 
'client', which leads to a non-distributed execution 



3.2 Replicating the Computing Sessions 

Another task of the late binder is to ensure replication of the computing session. It 
can be used for higher performance and availability through computational flooding, 
or for fault tolerance through voting. Computational flooding allows a client to start 
simultaneously two or more competing sessions for the same computation, and take 
the result that is first delivered. For example one can start a computationally intensive 
image processing application on the local machine, and simultaneously on an Origin 
2000 supercomputer node of the local network, and take the result first delivered. 
Another example is to use for computational flooding pools of workstations, which in 
generally provide low cost computing power (fig. 5). The client is not aware of the 
replication. It is initiated by the late binder, and implemented by the replication 
thread. In the current implementation the replication thread implements two policies: 




Fig. 4. Example of nested execution of tj, t'2, and t'3 (ring topology), and late binding of the 
'remote' objects a, b, c, d, e, and f to the computer host], host2, and host3 
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Fig. 5. Example of „three times" replicated execution of the computing session building a farm 
of f], f 2 , and f 3 without voting 



1. Take the first delivered result. The replicated computation sessions compete, and 
the fastest computation wins. We mask temporary overload situations, or crashes in 
the computing nodes, or in the communications, and achieve high performance and 
availability. 

2. Majority voting. The replication thread waits for a majority of identical results. 

The idea is to mask hardware failures, and to achieve fault tolerance. 



3.3 Installation of the ANC Programs 

The part of the Late Binder, which provides the installation of the programs on the 
remote computers and monitors their execution (the so called installer) has been 
implemented in three versions: 

1. JlNl installer for LAN environment (like a pool of workstations). We use the JINI 
lookup service to allow all existing JINI installers on a LAN to register with the 
Late Binder at the client side. This provides to the user a list of available 
workstations to be used for the code distribution. The Late Binder at the client side 
uses JINI RMI (Java Remote Method Invocation) to move the code to the specified 
computers, to compile it (if necessary), to start it, and to monitor its execution by 
periodically sending ‘are you alive messages’. After that, the Late Binder at the 
client side starts the client program, together with the replication thread, and exits. 
Note that JINI (Java) is used only for initialization of the computing session. The 
code execution itself is highly effective using active capsules (see next chapter). 

2. MOLE installer. The functionality of our MOLE installer is very similar to the JINI 
installer. We use the MOLE-based mobile agents to allow code distribution in a 
WAN environment; 

3. FTP/TELNET installer. We use ftp/telnet protocols to setup ANC applications on a 
list of remote computers, which do not support JINI or MOLE. 
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4 Running ANC Programs 



The ANC programs cooperate by exchanging unified active capsules. The ANC active 
capsule has the structure shown in figure 6. When the ANC function call is 
ANC_write ( ) or ANC_read ( ) , the active capsule carries data which can be found 
in the generic structure for the arguments. Marshalling or demarshalling is done using 
XDR stubs. 

If the ANC function call is ANC_call ( ) , then the fourth element will be used to 
specify the direct pointer to the remote program to be executed, and the third element 
- the names of the arguments. Note that no arguments are passed with the call, but 
only the name of the argument already marshaled in the remote address space. Remote 
pointers are used to invoke the remote ANC function or procedures directly, without 
any overhead for mapping, selection, scheduling, or invocation. 

/* Please do not edit the following section it was 
generated using ancgen. 

*/ 

typedef struct data data_t; /* struct for args */ 

struct amsg { /* struct for active capsule */ 

long func; /* pointer to remote ANC function */ 

char var_name; /* name of struct with arguments */ 

unsigned int index; /* pointer to remote struct with 

** args */ 

long proc; /* pointer to remote procedure */ 

data_t *data; /* struct to hold the args */ 

} ; 

typedef struct amsg amsg_t ; 



Fig. 6. The first element func includes a direct pointer to the remote ANC function calls to be 
called 



5 Distributing a Calculation Using ANC (In Progress) 

An application was written to compare the performance of ANC with RPC. This 
application distributes a matrix representing the temperature distribution of a square 
slab to as many servers as the client requires. The calculation is relatively simple, but 
must be done for each point on the interior of the slab (Figure 7), i.e. the new 
temperature at a point in the interior of the slab is the average of the temperature at 
each of the surrounding four points. The server(s) calculate the temperature 
distribution of the slab at some time m+1 when given the distribution at time m. They 
do this by receiving a small portion of the matrix held by the client. 

The client first partitions the matrix into as many segments as there are servers 
requested to perform the calculation. Once each server received its subset of the 
original matrix they perform the calculation in parallel and asynchronously. After the 
calculation is complete, each server returns the new results to the client. At this point 
the client is required to reconstruct a new matrix (representing time m+1) from the 
segments returned by the servers. 
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The application was written using some of the ANC functions listed in Figure 1 . In 
order to compare the performance of ANC against RPC, the application was modified 
such that the ANC functions were replaced by RPC functions. Data was collected for 
four different size matrices, where the client used only 2 servers to aid in the 
calculation. The data, listed in Table 1 describes the average time it took for the client 
and the servers to calculate the temperature distribution from time m to time m+1. In 
other words, this is the time it takes for the client to distribute the matrix to the 
servers, and receive the new results. Note, however, the ANC implementation did not 
include the use of the remote qualifier to create objects on the servers, because this 
feature is still being developed. As shown Table 1, the performance of the ANC 
implementation compares favorably with that of RPC for large sets of data. 
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Fig. 7 As shown here, the temperature distribution is divided into two segments - one for each 
server used in a calculation. In each partition, one extra row of the matrix is passed so that 
each of the elements within the partition can be calculated for the m+1 iteration. It is the job of 
the client to take the data from each server and place it in the matrix correctly 

Additional data needs to be collected to compare the ANC architecture against 
similar architectures like RMI. Moreover, the comparison will need to be reevaluated 
once the remote qualifier and the ANC pre-compiler have been completed. 

Table 1. A comparison of the average time it took the client and the two servers to calculate 
the temperature distribution from time m to time m+1 using ANC and RPC. Errors are quoted 
in parentheses at the plus/minus 2-sigma level 



Matrix Size 


ANC Implementation (s) 


RPC Implementation (s) 


lOx 10 


0.2019 (0.0004) 


0.0120 (0.0003) 


50x50 


0.289 (0.187) 


0.1073 (0.036) 


100 X 100 


0.460 (0.046) 


0.4098 (0.005) 


200 X 200 


1.139 (0.030) 


1.625 (0.002) 
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6 Conclusions 

We have presented a three-stage remote evaluation model for distributed computing 
that meets the conflicting requirements generality, usability, performance, 
interoperability, availability, and efficiency. The first three requirements are 
addressed only by using the definition of data objects belonging to a „remote storage 
class“, without any other language constructs for parallel and distributed 
programming. ANC separates the data flow from the operation flow, thus allowing a 
high degree of parallelism between the different phases of a remote program 
execution, i.e. posting the arguments to the remote address space, remote execution, 
and returning of the results can proceed in parallel with local computations. The 
interoperability, and availability is provided by the Late Binder, which distributes the 
remote programs over active network nodes, creates the computational session, and 
ensures replication of sessions. Furthermore, ANC supports computational flooding, 

i.e. the ability to conduct multiple competing sessions that perform the same 
computation, and accept the first returned result. Efficiency is achieved by extremely 
simple environment for processing of the active capsules, and possibility for using 
UDP broadcast. We have implemented several ANC components and the first 
computational experiments show very promising results. Future work includes testing 
the model with computationally intensive application under various network 
conditions. 
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Abstract. PLAN (Packet Language for Active Networks) has emerged 
as an important execution engine for deploying active network 
applications. PLAN adopts a hybrid active packets and nodes 
architecture to provide safety, security, flexibility and performance. 
This paper presents some research results of an ongoing project within 
the Secure Active VPN Environment (SAVE) project in active 
networks at Dalhousie University'. PLAN has been selected as the 
execution engine for running experiments on our test bed. In this paper, 
we propose two novel mechanisms for extending the functionality of 
PLAN. First, we introduce a mechanism to deploy services in PLAN 
using a C-Bridge. This C-Bridge enables the programmer to create 
services in C, thus enhancing the efficiency, flexibility and accessibility 
of PLAN. The second mechanism presented consists of an 
enhancement in the way PLAN communicates with Java applications. 
The enhancement presented comes from fixing a possible oversight in 
PLAN and by improving memory management. 

Keywords. PLAN, Active Networks, Service Creation, Deployment, 
Java, OCaml, Secure Active VPN Environment (SAVE), C-Bridge 



1 Introduction 

Active networking is a new networking paradigm that inserts intelligence within the 
network by offering dynamic programming capability to network nodes as well as to 
packets travelling in the network [1,2, 3, 4, 5, 6]. The result is a more flexible and 
powerful network that can be used to speed up the deployment of new applications. 
Applications that can benefit from active networks include network management, 
congestion control, multicasting and caching [6]. Different architectures have been 
researched for deploying active networks. Three kinds of active network 
architectures have emerged: active packets, in which the computations are limited to 
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the packets travelling in the network; active nodes, in which the computational power 
comes from the nodes only; and the hybrid version which combines both active 
packets and active nodes approaches. According to [6], the hybrid architecture 
appears to be the most promising one. 

SwitchWare is a hybrid active network architecture developed at the University of 
Pennsylvania [7]. The key part of this architecture is PLAN (Packet Language for 
Active Network) [8, 9, 10, 11, 12], a programming language used to create active 
packets. For the remainder of this paper, the terms PLAN and SwitchWare are used 
interchangeably and refer to the active network architecture that they define. 

The active packets component of the SwitchWare architecture is achieved through 
the PLAN language itself. On the other hand, the active nodes component is 
accomplished by services that are made available to PLAN programs and written in a 
general-purpose language. This hybrid two-level architecture offers flexibility and 
performance to applications using it. One of the important advantages of PLAN is 
that it has been designed with security as the first concern [13, 14]. A major part of 
this security comes from the limitations that were imposed on the language during its 
design. The other part comes from security-related services that are made available to 
PLAN programs as well as other services. 

This paper presents some results of an ongoing research project on active networks 
at Dalhousie University. The Secure Active VPN Environment (SAVE) project is 
part of a major project in Resource Management in High Speed Networks funded by 
the Canadian Institute for Telecommunications Research (CITR) and involves five 
different universities and five different telecommunications companies across Canada. 
We have built an active network test bed and have adopted PLAN as our execution 
engine. 

This paper presents two new mechanisms for extending the functionality of PLAN. 
First, we have implemented a C-Bridge for deploying services in PLAN. Those 
services are the means by which PLAN offers flexibility and performance to active 
packets. However, until now, PLAN provides the use of a functional programming 
language called OCaml (Objective Caml) [15, 16] for creating such services. The C- 
Bridge presented in this paper allows the use of C instead of OCaml for implementing 
services. This result can lead to more flexible and efficient services while making 
PLAN accessible to a larger group of programmers and developers. 

Second, we present an enhancement in the way PLAN communicates with Java 
applications. There are Java stubs provided with the PLAN distribution. These stubs 
are an important mechanism to provide flexibility, as they allow Java applications to 
interact with PLAN. However, we have discovered a possible oversight in the way 
these stubs convert certain PLAN data types. We explain the origin of this oversight 
and present a technique to fix it. The result is a stronger mechanism for integrating 
Java applications with PLAN. The new mechanism also brings a better memory 
management regarding these Java applications. 

The rest of the paper is organized as follows. Section 2 outlines the SAVE project 
test bed and some of the related research projects within the team. Section 3 gives 
more details on the PLAN architecture. Section 4 presents the motivation for the 
mechanisms presented in this paper. Section 5 explains the proposed C-Bridge 
mechanism. Section 6 presents the proposed enhancement regarding the 
communication of PLAN with Java applications. Section 7 provides concluding 




New Mechanisms for Extending PLAN Functionality in Active Networks 59 



remarks and Section 8 outlines possible future work resulting from the mechanisms 
presented in this paper. 



2 Secure Active VPN Environment (SAVE) 

Our research in active networks at Dalhousie focuses on building a security 
architecture and investigating the performance of applications over active VPNs. We 
are considering two interesting research avenues: (a) how we can use active networks 
to deploy VPNs; and (b) how we can use VPNs to deploy effective active network 
applications. Design and deployment of VPN on demand, network management and 
secure multicasting are some of the sub-projects that are currently being researched 
by our group. The first phase of this project comprised the evaluation of different 
active network architectures. 
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Fig. 1. SAVE’s current test bed configuration 



The work that led to the results presented in this paper has been done on the test 
bed set up in our lab for these experiments and research. Figure 1 shows the layout of 
the test bed. It currently has five IA32 systems: three active Pill 450 128MB nodes 
running Debian Linux and two multimedia workstations running Windows NT4. 
Active network execution engines PLAN, ANTS and NetScript are available on the 
three active nodes. VPN tunnels can be created using IPSec Free S/WAN and PPTP 
(Point-to-Point Tunneling Protocol). ANetD provides connection to the ABone and 
Internet access is via the Dalhousie network. Plans are underway for new equipment 
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additions. In order to provide a controlled environment for our research and 
experiments, we have the capability of running the equipment on a local (traffic-free 
if we so choose) network. 



3 PLAN 

In this section, we present more details on the PLAN architecture. PLAN is a hybrid 
active packets and nodes architecture to build active networks. It uses a two-level 
hierarchy [17] to provide safety, security, flexibility, and performance. The lower 
level eonsists of the PLAN language itself while the upper one, the service level, of- 
fers services to PLAN packets. PLAN packets are travelling on the network and 
services reside on the nodes on which the packets are evaluated. 

The safety and security of this architecture rely on two principles. First, the PLAN 
language used to create active packets is safe and seeure enough to let pure PLAN 
programs be evaluated with no authentication. The language is based on the simply 
type lambda-calculus, is expression-limited, and has bounded resources available. 
PLAN programs cannot communicate one another and cannot use recursion or non- 
fixed-length iteration. Thus, all programs are guaranteed to terminate as long as they 
call services that terminate [9, 13]. 

The second principle is that if a service is able to change the state of the active 
node, then this service is responsible for authenticating the packet before evaluating 
it. Authentication is part of PLAN’S core services while authorization can be accom- 
plished using Query Certificate Managers (QCM) [18]. QCM has also been devel- 
oped as part of the SwitchWare project. 

Services are the ideal way to alleviate PLAN’S limitations. Since they can be 
written in general purpose languages, services give all the flexibility and power 
needed to build strong distributed applications. However, until now, services could 
only be created in the language used by the PLAN implementation, i.e., OCaml for 
the latest PLAN version (v3.2). 

The following figures show the relations between the different entities comprising 
the PLAN/SwitchWare architecture: applications, PLAN packets and services. 
Figure 2 represents PLAN code that can be used to print the routes available at a 
distant node. Figure 3 shows how a getRoutes packet travels from an application to 
a desti-nation node. The PLAN packet is first created by the application and 
injected into an active node. The active node on X first evaluates the packet and 
forwards it to Y. Y evaluates the packet; it is the destination node so it sends its 
routing table back to the application through the active node on X. 
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SVC RIPGetRoutes : void -> (host * host * dev * int) list 



fun printRoutes (routes : (host * host * dev * int) list) : unit = 

( print (routes) ) 

fun getRoutes (source : host, destination: host) : unit = 

( if (thisHostls (destination) ) then 

OnRemote ( | printRoutes | (RIPGetRoutes 0) , source, getRB (), 
def aultRoute) 

else 

OnRemote (|getRoutes| (source, destination), destination, 
getRB 0, def aultRoute) ) 



Fig. 2. Getting distant routes in PLAN 




Fig. 3. Flow of a getRoutes PLAN paeket from an application on X to Y 



4 Motivation for New Mechanisms 

In this section, we present the motivation for creating new mechanisms to extend the 
functionality of PLAN. In order to extend the functionality of PLAN, one needs to 
create services and make them available to PLAN programs. In PLAN version 3.2, 
the only way provided to implement new services is through OCaml [19]. Even if 
OCaml is a nice and powerful programming language, it is not as well known and 
used as other ones like C or Java. One cannot consider using PLAN as an active net- 
work solution simply because of his ignoranee of OCaml. In faet, OCaml is pre- 
sented as a disadvantage for some people when opting for PLAN v3.2 instead of its 
PLAN v2.21 (Java) eounterpart on the PLAN web site [20]. 

A new meehanism for ereating PLAN services using a common programming lan- 
guage would probably lead to a broadened use of PLAN as an active network solution 
by making it more accessible. Since version 3.2 is only available for Linux/UNIX 
machines, C appears to be a logical choice as a well-known programming language. 
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Applications are another means to promote the use of PLAN. Once again, an ap- 
plication making use of PLAN is typically written in PLAN’S implementation lan- 
guage, e.g. OCaml in version 3.2. However, interesting Java stubs are provided with 
this version of PLAN to enable the use of Java for creating applications. This is quite 
useful since one can create powerful applications using Java. However, we have dis- 
covered a possible oversight in the way these stubs convert PLAN data types to Java 
ones and vice-versa. We propose a way to fix this oversight in order to prevent cor- 
ruption of data moving between a Java application and an active node. 



5 C-Bridge: Extending PLAN Services Using C 

This section presents a mechanism developed by the authors to implement PLAN 
services using C instead of OCaml. 

Objective Caml (OCaml) offers a built-in interface to integrate C functions within 
its source code [16]. This interface can be used to call a C function from an OCaml 
function and vice versa. In this way, creating C services and offering them to PLAN 
programs is theoretically possible. However, a PLAN service must have the 
following type: 



val service: 

Basis . active_packet * (Basis. value list) -> Basis. value 

In order to create a C-bridge, one needs to convert all PLAN values and types into 
data structures usable by C. The same remains true for the values returned by C 
functions; they must be of a valid PLAN type to let the function be properly typed as 
a service. We have designed a library to make a bridge between PLAN and C possible. 
Our library defines an equivalent in C to the PLAN values. It also provides the 
functions needed to perform marshalling/unmarshalling of PLAN values. These 
functions are based on the marshalling methods included with PLAN version 3.2 
distribution. 

To use our bridge, one first needs to include the header file “PLANValue.h” in the 
service source code. Then, for every service definition, one may use the data types 
and structures provided in this file to specify the service’s arguments and return types. 
Figure 4 shows the PLAN data types and structures that are defined by the bridge. A 
PLAN service code using the bridge is almost the same as a regular C function, 
except that the parameters and return values are PLAN types. Figure 5 gives source 
code for a service generating random numbers. Except for the arguments and the 
return value, there is no restriction regarding the service source code. In this way, a 
service can use any C library to offer more functionality to PLAN programs. Since C 
is a commonly used programming language, there are many libraries available to 
programmers. Furthermore, C allows the development of highly optimized and 
efficient routines. Thus, PLAN programs could benefit from this performance. For 
example, if a PLAN packet containing a large payload is to be sent on a very slow 
link, it could call a compression service written using the C-Bridge. Such a service 
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could make use of an existing efficient compression library for increased 
performance. However, when creating services using the C-Bridge, one should take 
the same care as s/he would with OCaml and follow the PLAN programming 
guidelines [21]. 



PLAN_UNIT 

PLAN_INT 

PLAN_CHAR 

PLAN_STRING 

PLAN_BOOL 

PLAN_HOST 

PLAN_BLOB 

PLAN_LIST 

PLAN_TUPLE 

PLAN__PORT 

PL ANIKEY 

PLAN_EXN 

PLAN_DEV 

PLAN_CHUNK 

PLAN VALUE 



Fig. 4. PLAN data types and structures to use with the C-Bridge 



#include <stdlib.h> 

#include <time.h> 
tinclude " PLANValue . h" 

/*** 

* Service used to initialize the random number generator. 
**/ 

PLAN_UNIT InitRandomO 

{ 

srand( (unsigned) time ( NULL ) ) ; 

return UNIT; 

} 

^ * * * 

* Service used to generate a random number in the range [0, 

* maximum- 1 ] . 

•k -k j 

PLAN_INT Random ( PLAN_INT maximum) 

{ 

return (PLAN_INT) (rand() % maximum); 

} 



Fig. 5. Source code used to generate random numbers using C-Bridge 

Interfacing C and OCaml requires OCaml code to be written in order to call the C 
functions. C stubs are also needed to do the conversion between OCaml and C 
values. The bridge aims to simplify the development of services; if one needs to 
create OCaml and C stubs in addition to the implementation code for incorporating C 
services in PLAN, s/he might find the process harder rather than simpler. For this 
reason, we have also developed a wrapper tool for generating these stubs. This tool 
uses a definition file specifying the services to create, as long as those services use the 
bridge’s PLAN data types. The OCaml stubs generated by the wrapper are the ones 
that are registered as services to PLAN. When an OCaml stub is called, it first 
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validates the arguments that are passed to the service, it then extracts those arguments 
and calls the C service. If the arguments do not have the proper type, a PLAN 
ExecException is raised and the C service is not called. 

A PLAN service can be a polymorph; it can take any PLAN value as parameter. In 
this way, the bridge offers a mechanism to do so through the PLAN_VALUE data 
type. A PLAN_VALUE variable contains information about the actual data type 
along with the data itself If one wants to create a polymorph service, s/he has to 
verify the actual data type at runtime and performs the proper computations on the 
data. Figure 6 represents an example source code to create a polymorph service using 
the C-Bridge. 

#include " PLANValue . h" 

/*** 

* Example of a polymorph service 

* * ^ 

PLAN_UNIT polymorphService ( PLAN_VALUE value) 

{ 

switch (value. tag) 

{ /* Call the proper function... */ 
case INT_TAG: 

intService (value . val . iVal ) ; 
break; 

case CHAR_TAG: 

charService (value . val . cVal) ; 
break; 

case STRING_TAG: 

stringService (value .val . szVal) ; 
break; 

} 

return UNIT; 

} 



Fig. 6. Sample source code to create a polymorph service using C-Bridge 

The current version of the bridge does not allow a C service to receive an active 
packet directly. If one needs to perform validation on the packet itself, s/he can do so 
by modifying the OCaml stubs directly. However, this feature is in the future 
enhancement list of the C-Bridge. In the same way, a link to the OCaml parser to 
create active packets is also in the scope of an updated version of the bridge. 



6 Using PLAN through a Java Application 

As mentioned earlier, PLAN provides safety, security and flexibility. The Java stubs 
included in PLAN version 3.2 distribution are an important mechanism to provide 
flexibility. These stubs, by converting PLAN data types into Java ones and vice 
versa, allow communication between Java applications and PLAN. However, as part 
of our PLAN evaluation, we discovered a possible oversight in the Java stubs that 
could lead to a fault or data corruption. This section presents an enhancement in the 
way PLAN can communicate with Java applications by proposing a fix for these Java 
stubs. 
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The Java stubs included with the distribution can be used for writing host applica- 
tions in Java to interface with a PLAN active node. These classes do not allow one to 
write services in Java [22]. One can use these classes for converting PLAN values 
into Java ones (and vice versa) as well as for creating PLAN packets by parsing and 
translating source code into the standard AST wire format used by PLAN. These 
stubs are interesting as they offer the conviviality of Java for end host applications 
and the flexibility and security of PLAN for the active network architecture. 

However, there is a possible oversight in the way the blob data type is converted 
between PLAN and Java. The blob type represents a stream of bits in PLAN. 
OCaml, like C, does not have a byte data type. However, a byte can be represented 
by the character data type. In this way, a blob value is represented as a string (a finite 
sequence of characters [16]) in PLAN’S OCaml implementation. 

According to [16], “character values [in OCaml] are represented as 8-bit integers 
between 0 and 255. Character codes between 0 and 127 are interpreted following the 
ASCII standard. The current implementation interprets character codes between 128 
and 255 following the ISO 8859-1 standard.” 

In the PLAN Java stubs, the blob data type is also represented as a string (String 
Java class). A Java string is also a finite sequence of characters. In spite of that, a 
character in Java is not the same as one in OCaml. Java uses Unicode as its native 
character encoding, which requires 16 bits per character for a total of 65,538 possible 
characters [23]. Here is the description of the String constructor used in the stubs for 
creating a blob, as found on the official Java SDK documentation web site [24]: 

“String(byte[] bytes) 

Construct a new String by converting the specified array of bytes using the 
platform's default character encoding.” 

In this way, if the platform on which the Java application is running uses a 
different encoding than ISO 8859-1, the bytes may be converted and thus, the data 
gets modified. Furthermore, if the byte cannot be converted to a known character in 
the platform’s encoding, a replacement character substitutes it, usually ‘?’ [25]. We 
have encountered this problem when we tried to stream WAVE files from a French 
Canadian operating system using the stubs; many ‘?’ characters were introduced in 
the files, thus altering the sound by introducing noise. Our experimental results 
suggest that this problem may occur when the stubs are used on non-English 
operating systems. As PLAN becomes more universally adopted, it may be important 
to fix this oversight. 

Fixing this problem is simple. Recalling that a blob is a stream of bits, Java pro- 
vides a data type fitting this definition, a b 5 he array. A Java byte is an 8-bit integer 
data type. In this way, all we have to do is replace the definition and marshalling 
functions of the blob data type in the Java stubs. The following figures show the 
modifications needed. These modifications also bring another interesting benefit: by 
replacing a string by a byte array, we need half the memory space to store a blob in a 
Java application since a character uses 16 bits while a byte requires only 8 bits. 

Note that new methods are also needed (but not displayed in the figures) in order to 
perform marshalling/unmarshalling of the new blob type. 
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public static Value Blob (String _argO) { 
return new Blob (_argO ) ; 

} 

public static class Blob extends Value { 
public String _argO; 

public Blob (String _argO) { 
super (Blob_tag) ; 
this._argO = _argO; 

} 

public void marshal (OutputStream os) throws lOException { 
super .marshal (os) ; 

Marshalling . putString (os, _argO) ; 

} 

} 

public static Value unmarshal ( InputStream is) throws 
lOException { 

case Blob_tag: 

{ 

String _argO = Marshalling . getString (is); 
return Blob (_argO) ; 

} 



^ 

Fig. 7. Original source code regarding blob type in the Java stubs 

A similar problem could potentially occur regarding the conversion of strings. 
OCaml recognizes only the ISO 8859-1 character set but a string coming from Java 
can be in a different one. This could lead to an undesired modification of the data. 
Hence, we should probably convert all strings in the ISO 8859-1 character set in Java 
when dealing with PLAN. This can be accomplished by using the following String 
constructor: 

String (byte [ ] bytes, String enc) 

The enc parameter should he “8859_1” to remain consistent with PLAN. On the 
other hand, if the platform’s current encoding was important to the application, the 
conversion to ISO 8859-1 would not be a good deal. Our suggestion is that a new 
constructor for the PString PLAN data type in the Java stubs should be added to 
create “standard” ISO 8859-1 strings to allow compatibility between Java applications 
and others in different languages. 
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public static Value Blob (byte [] _argO) ( 
return new Blob (_argO ) ; 

} 

public static Value Blob(byte[] _argO, int iPos, int ILen) { 
return new Blob(_argO, iPos, ILen) ; 

} 



public static class Blob extends Value { 
public byte[] _argO; 

public Blob (byte [] argO) ( 
super (Blob_tag) ; 

_argO = new byte [argO . length] ; 

System. arraycopy (argO , 0, _arg0, 0, argO . length) ; 

} 

public Blob (byte [] argO, int iPos, int iLen) { 
super (Blob_tag) ; 

_arg0 = new byte [iLen]; 

System. arraycopy (argO , iPos, _arg0, 0, iLen) ; 

} 

public void marshal (OutputStream os) throws lOException { 
super .marshal (os) ; 

Marshalling . putBlob (os, _arg0) ; 




public static Value unmarshal (InputStream is) throws 
lOException ( 

case Blob_tag: 

{ 

byte [ ] _arg0 = Marshalling . getBlob ( is ) ; 
return Blob (_arg0 ) ; 



Fig. 8. Modified source code regarding blob type in the Java stubs 

7 Conclusion 

Active networks have the potential to enable the deployment of new network services. 
By inserting intelligence in the network, this emerging technology offers the 
flexibility needed to adapt to mobile topologies and constant changes in delivery 
requirements. Wireless and multicasting communications will become basic 
requirements in tomorrow’s Internet, and we feel that active networks can be big 
players in it. Before reaching that goal, security and performance issues in active 
networks still need to be address. However, new architectures that aim to offer 
security, flexibility and performance are being developed, and PLAN is one of them. 
PLAN offers many interesting features and has what is needed to be a reference in 
active networking. 

In this paper, we have presented two researeh results regarding the functionality of 
PLAN. C-Bridge enables the implementation of PLAN serviees using C. Extending 
the functionality of PLAN using C allows the creation of more powerful and flexible 
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services by offering a direct access to efficient built-in libraries. An interesting exam- 
ple presented would the use of an efficient compression library before sending a 
packet over a very slow link. Offering well-known programming languages for the 
creation of PLAN services can also broaden the use of PLAN by making it accessible 
to more potential users. This result can also leads to the use of C++ and Java as other 
solutions for creating PLAN services. 

The second result introduced by this paper proposes an improvement in the way 
PLAN communicates with a Java application. We have revealed a possible oversight 
in the Java stubs included in the PLAN distribution. We showed, based on our own 
experience, how this oversight can lead to a fault or data corruption for specific data 
types on non-English operating systems. We explained the origin of the problem and 
presented techniques to fix it. The fix proposed also leads to a better memory man- 
agement model for Java applications integrating PLAN. 



8 Future Work 

We have presented a way to extend PLAN functionality by the creation of PLAN 
services written in C instead of OCaml. We also know that one is able to create Java 
applications to interact with PLAN using the Java stubs included with the distribution. 
However, we do not have mechanisms to create applications written in C and services 
written in Java. 

The reason why the stubs cannot be used to implement services in Java is because 
there is no bridge between OCaml and Java so far. However, such a bridge could be 
built using C-Bridge along with Java Native Interface (JNI) [26], which allows the 
manipulation of the Java Virtual Machine (JVM) from C functions. Accessing C 
functions from Java is also possible using JNI. Hence, building a bi-directional 
bridge between PLAN and Java is feasible. 

To create applications written in C, one could use our bridge for converting data 
types. The major issue remains the creation of PLAN packets, especially parsing 
PLAN code in C. The first solution that comes to mind is the creation of a PLAN 
parser in C. However, one could avoid this step by making use the OCaml parser or 
Java parser through JNI. 

The C-Bridge also makes it possible to use C++ for implementing services. One 
simply needs to map C++ public methods to C functions and use them as PLAN 
services. 
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Abstract. We have previously proposed, implemented and demon- 
strated an Application Layer Active Network (ALAN) infrastructure. 
This infrastructure permits the dynamic deployment of active services 
in the network, but at the application level rather than the router level. 
Thus the advantages of active networking are realised, without the dis- 
advantages of router level implementation. However we have previously 
left unsolved the issue of appropriate placement of ALAN supported ser- 
vices. This is an Application Layer Routing problem. In this paper we 
define this problem and show that, in contrast to IP, it is a multi-metric 
problem. We then propose an architecture that helps conceptualise the 
problem and build solutions. We propose detailed approaches to the ac- 
tive node discovery and state maintenance aspects of Application Layer 
Routing (ALR). 



1 Introduction 

We have previously proposed, implemented and demonstrated an approach to 
active networks based on Application Layer Active Networking (ALAN) [3]. We 
believe that this approach can achieve many of the benefits ascribed to active 
networks without the considerable drawbacks evident in the implementation of 
active networks in IP routers. 

Our approach has been validated by developments in the commercial Internet 
environment. It is the case that some Internet Service Providers (ISPs) will 
support servers at their sites supplied by third parties, to run code of their (3rd 
parties’) choice. Examples include repair heads [26], which are servers in the 
network which help with reliable multicast in a number of ways. In the reliable 
multicast scenario an entity in the network can perform ACK aggregation and 
retransmission. Another example is Fast Forward Networks Broadcast Overlay 
Architecture [27]. In this scenario there are media bridges in the network. These 
are used in combination with RealAudio [28] or other multimedia streams to 
provide an application layer multicast overlay network. 

We believe that rather than placing “boxes” in the network to perform spe- 
cific tasks, we should place generic boxes in the network that enable the dynamic 
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execution of application level services. We have proposed a ALAN environment 
based on Dynamic Proxy Servers (DPS). In our latest release of this system we 
rename the application layer active nodes of the network Execution Environ- 
ments for Proxylets (EEPs). By deploying active elements known as proxylets 
on EEPs we have been able to enhance the performance of network applications. 

In onr initial work we have statically configured EEPs. This has not addressed 
the issue of appropriate location of application layer services. This is essentially 
an Application Layer Routing (ALR) problem. 

For large scale deployment of EEPs it will be necessary to have EEPs dy- 
namically join a mesh of EEPs, with little to no configuration. As well as EEPs 
dynamically discovering each other applications that want to discover and use 
EEPs should also be able to choose appropriate EEPs as a function of one or 
more form of routing metric. Thus the ALR problem resolves to an issue of 
overlaid, multi-metric routing. 

This paper is organised as follows. We first describe our ALAN infrastruc- 
ture by way of some application examples. These examples reveal the Application 
Layer Routing issues that require solution. We then propose an architecture that 
aids conceptualisation and provides a framework for an implementable solution. 
This paper concentrates on the issues of node (EEP) discovery and state mainte- 
nance. We conclude by referencing related work and describing our future work, 
which is already in progress. 



2 Application Layer Active Networking 

Our infrastructure is quite simple and is composed of two components. Firstly 
we have a proxy let. A proxy let is analogous to an applet [29] or a servlet [30]. 
An applet runs in a WWW browser and a servlet runs on a WWW server. In 
our model a proxylet is a piece of code which runs in the network. The second 
component of our system is an (Execution Environment for Proxylets) EEP. 
The code for a proxylet resides on a WWW server. In order to run a proxylet 
a reference is passed to an EEP in the form of a URL and the EEP downloads 
the code and runs it. The process is slightly more involved but this is sufficient 
details for the current explanation. 

In our initial prototype system, “funnelWeb” , the EEP and the proxylets are 
written in Java. Writing in Java has given us both code portability as well as 
the security [31] of the sandbox model. The “funnelWeb” [32] package runs on 
Linux, Solaris and Windows NT. 

2.1 WWW Cache Proxylet 

We have written a number of proxylets to test our ideas. Possibly the most com- 
plicated example has been a webcache proxylet [33] . In this example a webcache 
proxylet is co-resident with a squid cache [34] . WWW browsers typically support 
a configuration option which allows all accesses to be via a cache. This option 
is normally set by the systems administrator installing a browser. It is possible 
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to configure this variable to point to our webcache proxylet instead, thus all 
browser requests can be monitored by the webcache proxylet. This is shown in 
Fig. 1. 

The webcache proxylet does not actually perform any caching, relying instead 
on the co-resident cache. What it can do however is perform some transforma- 
tions on the retrieved page such as transcoding or compression. This can be 
performed as a function of the mime content type of a requested URL. 




Fig. 1. Text Compression 



For example, for large text files it might be sensible to compress the file 
before it is transmitted. Rather than do this at the WWW server, we try to 
identify an EEP which is close to the WWW server. A compressor proxylet 
is then sent to that EEP, which downloads the page from the WWW server 
and compresses it. The compressed text is received by a decompressor proxylet 
launched by the webcache proxylet close to the WWW client. This proxylet 
decompresses the page and returns it to the WWW browser. In our experiments 
we were able to compress the data for a transcontinental (low bandwidth) parts 
of a connection. This improves both the latency and cost of the download. This 
process is illustrated in Fig. 1. 

A limitation of our initial experiments is that the locations of the various 
EEPs are known a priori by the webcache proxylet. Another problem is that 
it is not always clear that it is useful to perform any level of compression. For 
example if a WWW server is on the same network as the browser it may make 
no sense to attempt to compress transactions. 
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From this application of proxylets two clear requirements emerge which need 
to be satisfied by our application layer routing infrastructure. The first require- 
ment is to return information regarding proximity. A question asked of the rout- 
ing infrastructure may be of the form: return the location of an EEP which is 
close to a given IP address. Another requirement could be the available band- 
width between EEPs, as well as perhaps the bandwidth between an EEP and 
a given IP address. An implied requirement also emerges. It should not take 
more network resources or time to perform ALR than to perform the native 
transaction. 

Given this location and bandwidth information the webcache proxylet could 
now decide if there were any benefit to be derived by transcoding or compressing 
a transaction. So a question asked of the application layer routing infrastructure 
may be of the form: find a EEP “close” to this network and also return the 
available bandwidth between that EEP and here. 

2.2 TCP Bridge 

One of our simplest proxylets is a TCPbridge proxylet. The TCPbridge proxylet 
runs on an EEP and accepts connections on a port that is specified when the 
proxylet is started. As soon as a connection is accepted a connection is made 
to another host and port. This proxylet allows application layer routing of TCP 
streams. It could obviously be extended to route specific UDP streams. 

We have experienced the benefits of a TCPbridge by using telnet to remote 
log in to computers across the globe. A direct telnet from one computer to an- 
other across the global Internet may often experience very poor response times. 
However if one can chain a number of TCP connections (essentially, source rout- 
ing) by logging into intermediate sites, better performance is typically achieved. 
This is because the segmented TCP connections respond more promptly to loss, 
and do not necessarily incur the overhead of end-to-end error control. 

A requirement that emerges from this scenario is the need for the application 
routing infrastructure to return a number of EEPs on a particular path. We may 
ask of the ALR: give me a path between node A and node B on the network 
which minimises delay. We may also ask for a path between node A and B 
which maximises throughput, by being more responsive to errors. It may also 
be possible that we require more than one path between node A and node B for 
fault tolerance. 

2.3 VOIP Gateway 

A proxylet that we intend to write is co-located with a gateway from the Internet 
to the PSTN. The idea is that a person is using their PDA (Personal Digital 
Assistant) with a wireless network interface such as IEEE 802.11 or perhaps 
Bluetooth [35]. 

Using a packet audio application such as “vat”, the user wishes to make a 
voice call to a telephone via an IP-to-telephony gateway. The simple thing to do 
would be to discover a local gateway. A more interesting solution would be to 
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find the closest gateway to the telephony endpoint. An argument for doing this 
might be that it is cheaper to perform the long haul part of the connection over 
the Internet rather than by using the PSTN. 

This adds another requirement: the ability to discover information about 
available services. The normal model for service discovery is to find a service in 
the local domain. We have a requirement for service discovery across the whole 
domain in which EEPs are running. So a proxylet on an EEP which is providing 
a VOIP gateway may want to inject information into the routing infrastructure 
which can be used by VOIP aware applications. 

2.4 Multicast 

With the seeming failure of wide area multicast deployment it starts to make 
sense to use proxylets inside the network to perform fanout of streams, as well 
as perhaps transcoding and retransmission. A requirement that emerges for mul- 
ticast is that there is enough information in the routing infrastructure for the 
optimal placement of fanout points. 

3 Application Layer Routing Architecture 

A number of routing requirements for our ALAN infrastructure have emerged 
from the examples described in the previous section. In essence our broad goal 
is to allow clients to choose an EEP or set of EEPs on which to run proxylets 
based on one or more cost functions. Typical metrics will include: 

— Available network bandwidth. 

— Current delay. 

— EEP resources. 

— Topological proximity. 

We may also want to add other constraints such as user preferences, policy, 
pricing, etc. In this paper we focus initially on metric-based routing. 

An Application Layer Routing (ALR) solution must scale to a large, global 
EEP routing mesh. It must permit EEPs to discover other EEPs, and to maintain 
a notion of “distance” between each EEP in a dynamic and scalable manner. 
It will allow clients to launch proxylets (or “services”) based on one or more 
metric specifications and possibly other resource contingencies. Once these ser- 
vices are launched, they become the entities that perform the actual “routing” 
of information streams. 

We therefore propose a ALR architecture that has four components. In this 
section we simply provide an overview of the architecture. The four components 
are as follows. 

1. EEP Discovery. 

2. Routing Exchanges. 

3. Service Creation. 

4. Information Routing. 
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EEP discovery is the process whereby an EEP discovers (or is able to dis- 
cover) the existence of all other EEPs in the global mesh. In our current imple- 
mentation (described below) all EEPs register at a single point in the network. 
This solution is clearly not scalable, requiring the introduction of a notion of hi- 
erarchy. Our proposed approach is described in the next section. The approach 
addresses both the arrival of new EEPs and the termination or failure of existing 
EEPs. 

Routing exchanges are the processes whereby EEPs learn the current state 
of the ALAN infrastructure with regard to the various metrics. On this basis 
EEP Routing Tables are built and maintained. The routing meshes embedded 
in routing tables may also implement notions of clusters and hierarchy. How- 
ever these structures will be dynamic, depending on the state of the network, 
with different structures for different metrics. The state information exchanges 
from which routing information is derived may be explicitly transmitted between 
EEPs, or may be inferred from the observation of information streams. 

Service creation is the process whereby a proxylet or set of proxylets are 
deployed and executed on one or more EEP. The client of this service creation 
service specifies the proxylets to be launched and the choice(s) of EEP specified 
via metrics. The client may also specify certain service dependencies such as 
EEP resource requirements. 

The service proxylets to be launched depend entirely on the service being 
provided. They encompass all the examples described in the previous section. 
For example, the webcache proxylet launches transcoders or compressors ac- 
cording to mime content type. The metric used here will be some proximity 
constraint (e.g. delay) to the data source, and/or available bandwidth on the 
path. The TCP bridge proxylets will be launched to optimise responsiveness 
to loss and maximise throughput. The VOIP gateway proxylet will require a 
telephony gateway resource at the EEP. 

Information routing is the task performed by the proxylets once launched. 
Again the function performed by these proxylets are dependent on the service. 
It may entail information transcoding, compression, TCP bridging or multicast 
splitting. In each case information is forwarded to the next point (s) in an appli- 
cation level path. 

The rest of this paper is devoted to describing our more detailed proposals 
for EEP Discovery and Routing Exchanges. 

4 Discovery 

4.1 Discovery Phase 

We will now describe how the discovery phase takes place. The discovery phase 
is implemented by a “discovery proxylet” that is pre-configured with each EEP. 
Since proxylets are loaded on EEPs by URL reference, it is trivial to update the 
version of the discovery proxylet - it will be automatically loaded when a EEP 
starts up. 
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The function of the discovery phase is for all EEPs to join a global “database” , 
which can be used/interrogated by a “routing proxylet” (discussed further in the 
next section) to find the location of an EEP(s) which satisfies the appropriate 
constraints. Constructing this database through the discovery phase is the first 
stage towards building a global routing infrastructure. 



Requirements 

There are a number of requirements for our discovery phase: 

1. The solution should be self configuring. There should be no static configu- 
ration such as tunnels between EEPs. 

2. The solution should be fault tolerant. 

3. The solution should scale to hundreds or perhaps thousands of deployed 
EEPs. 

4. No reliance on technologies such as IP multicast. 

5. The solution should be flexible. 

The Discovery Protocol 

The solution involves building a large distributed database of all nodes. A 
naive registration/discovery model might have all registrations always going to 
one known location. However it is obvious that such a solution would not scale 
beyond a handfull of nodes. It would not be suitable for a global mesh of hundreds 
or thousands of nodes. 

In order to spread the load we have opted for a model where there is a 
hierarchy. Initially registrations may go to the root EEP. But a new list of EEPs 
to register with is returned by the EEP. So a hierarchy is built up. An EEP has 
knowledge of any EEPs which have registered with it as well as a pointer to 
the EEP above it in the hierarchy. So the information regarding all the EEPs is 
distributed as well as distributing where the registration messages go. The time 
to send the next registration message is also included in the protocol. So as the 
number of EEPs grows or as the system stabilises the frequency of the messages 
can be decreased. 

If an EEP that is being registered with fails, the EEP registering with it will 
just try the next EEP in its list until it gets back to the root EEP. 

With this hierarchal model, if the list of all EEPs is required then an appli- 
cation (normally this will be only routing proxylets), can contact any EEP and 
send it a node request message. In response to a node request message three 
chunks of information will be returned: a pointer up the hierarchy where this 
EEP last registered; a list of EEPs that have registered with this EEP if any; 
a list of the backup EEPs that this EEP might register with. Using the node 
message interface it is possible to walk the whole hierarchy. So either an appli- 
cation extracts the whole table and starts routing proxylets on all nodes, or a 
routing proxylet is injected into the hierarchy which replicates itself using the 
node message. 
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The discovery proxylet offers another service. It is possible to register with 
the discovery proxylet to discover state changes. The state changes that are of 
interest are a change in where registration messages are going, a EEP which has 
failed to re-register being timed out, and a new EEP joining the hierarchy. 

In the discussion above we have not said anything about how the hierarchy 
is actually constructed. We don’t believe that it actually matters so long as we 
have distributed the load, to provide load balancing and fault tolerance. 

It may seem intuitive that the hierarchy be constructed around some metric 
such as Round Trip Time (RTT). So, say, all EEPs in the UK register with an 
EEP in the UK. This would certainly reduce the network load against, say, a 
model that had all the registrations made by EEPs in the UK going to Australia. 
A defence against pathological hierarchies is that the registration time can be 
measured in minutes or hours not seconds. We can afford such long registration 
times because we expect the routing proxylets to exchange messages at a much 
higher frequency and form hierarchies based on RTT and bandwidth etc... So 
a failed node will be discovered rapidly at the routing level. Although it seems 
like a chicken and egg situation the discovery proxylets could request topology 
information from the routing proxylets, to aid in the selection of where a new 
EEP should register. We also don’t want to repeat the message exchanges that 
will go on in the higher level routing exchanges. 

We have considered two other mechanisms for forming hierarchies. The first is 
a random method. In this scheme a node will only allow a small fixed number of 
nodes to register with it. Once this number is exceeded any registration attempts 
will be passed to one of the list of nodes which is currently registered with 
the node. The selection can be made randomly. If, for example, the limit is 
configured to be five, the sixth registration request will be provided with the 
already registered nodes as the new parent EEP. This solution will obviously 
form odd hierarchies in the sense that they do not map onto the topology of the 
network. It could however be argued that this method may give an added level 
of fault tolerance. 

The second method that we have been considering is a hierarchy based on 
domain names, so that the hierarchy maps directly onto the DNS hierarchy. 
Thus all sites with the domain “.edu.au” all register with the same node. In this 
case we can use proximity information derived from DNS to build the hierarchy. 
This scheme will however fail with the domain “.com”, where nothing can be 
implied about location. Also, the node that is accepting registrations for the 
“.com” domain will be overwhelmed. 

We believe that, since registration exchanges occur infrequently, we can 
choose a hierarchy forming mechanism which is independent of the underly- 
ing topology. The more frequent routing exchanges discussed in the next section 
will map onto the topology of the network and detect any node failures. 

We have discussed a hierarchy with a single root. If this proved to be a 
problem, we could have an infrastructure with multiple roots. But unlike the rest 
of the hierarchy the root nodes would have to be aware of each other through 
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static configuration, since they would have to pool information in order to behave 
like a single root. 



Messages Exchanged by Discovery Proxylet 

The types of messages used are registration messages and node messages. The 
registration messages are used solely to build the hierarchy. The node messages 
are used to interrogate the discovery infrastructure. 

Version numbers are used to both detect a protocol mismatch, and to trigger 
the reloading of a new version of the discovery proxylet. The host count will 
make it simple to discover the total number of nodes by interrogating the root 
node. 

— Registration request message. 

• Version number. 

• This nodes name. 

• Count of hosts registered below this node. 

- Registration acknowledgement message, 

sent in response to a registration request message. 

• Version number. 

• Next registration time. 

A delay time before the next registration message should be sent. This 
timer can be adjusted dynamically as a function of load, or reliability of 
a node. If a node has many children the timer may have to be increased 
to allow this node to service a large number of requests. If a child node 
has never failed to register in the required time, then it may be safe to 
increase the timeout value. 

• List of nodes to register with, 

used for fault tolerance. Typically the last entry in the list will be the 
root node. 

- Node request message. 

• Version number. 

— Node acknowledgement message, sent in response to a node request message. 

• Version number. 

• Host this node registers with. 

• List of backup nodes to register with. 

It is useful to have the list of backup nodes in case the pointer up to the 
local node fails, while a tree walk is taking place. 

• List of nodes that register with this node. 

5 Routing Exchanges 

Once the underlying registration infrastructure is in place this can be used to 
start routing proxylets on the nodes. The process is simple and elegant. A routing 
proxylet can be loaded on any EEP. One routing proxylet needs to be started on 
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just one EEP anywhere in the hierarchy. By interrogating the discovery proxylet 
all the children can be found as well as the pointers up the tree. The routing 
proxylet then just starts an instance of itself on every child and on its parent. 
This process repeats and a routing proxylet will be running on every EEP. The 
routing proxylet will also register with the discovery proxylet to be informed 
of changes (new nodes, nodes disappearing, change in parent node). So once a 
routing proxylet has launched itself across the network, it can track changes in 
the network. 

Many different routing proxylets can be written to solve various problems. It 
may not even be necessary run a routing proxylet on each node. It may be possi- 
ble to build up a model of the connectivity of the EEP mesh by only occasionally 
having short lived, probing proxylets running on each cluster. The boundary be- 
tween having a centralised routing infrastructure against a distributed routing 
infrastructure can be shifted as appropriate. 

We believe that many different routing proxylets will be running using differ- 
ent metrics for forming topologies. Obvious examples would be paths optimised 
for low latency. Or paths optimised for high bandwidth. This is discussed further 
below. 

5.1 Connectivity Mesh 

In a previous section we described a number of proxylets that we have already 
built and are considering building, along with their routing requirements. Some 
routing decision can be solved satisfactorily by using simple heuristics such as 
domain name. There will however be a set of services which require more accurate 
feedback from the routing system. 

We believe that some of the more complex routing proxylets will have to make 
routing exchanges along the lines of map distribution MD [8]. A MD algorithm 
floods information about local connectivity to the whole network. With this 
information topology maps can be constructed. Routing computations can be 
made hop by hop. An example may be an algorithm to compute the highest 
bandwidth pipe between two points in the network. A more centralised approach 
may be required for application layer multicast where optimal fan out points are 
required. 

In fixed routing architectures each node, by some mechanism, propagates 
some information about itself and physically connected neighbours. Metrics such 
as bandwidth or RTT for these links may be included in these exchanges. In the 
ALR world we are not constrained to using only physical links to denote neigh- 
bours. There may be conditions where nodes on opposite sides of the world may 
be considered neighbours. Also links do not necessarily need to be bidirectional. 
We expect to use various metrics for selecting routing neighbours. We won’t 
necessarily be distributing multiple metrics through one routing mesh. We may 
create a separate routing mesh for each metric. This is explored further below. 

Another important issue which doesn’t usually arise from traditional routing 
is that if care is not taken, certain nodes may disappear from the routing cloud 
if a node cannot find a neighbour. 
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5.2 Snooping for Network State 

The maintenance of network state can be performed via periodic exchanges 
between routing proxylets. While this has merit, it has the disadvantage that 
the exchanges themselves put load on the network, and therefore impact network 
performance. While not dismissing this approach, we propose here an alternative 
approach that uses more implicit routing exchanges. 

Proxylets started on EEPs make use of standard networking APFs to trans- 
mit information. It would be relatively simple to add a little shim layer, such that 
whenever a networking call is made we can estimate, for example, the bandwidth 
of a path. Thus bandwidth information derived from service proxylets, such as 
a multicast proxylet, can be feed back into the routing infrastructure. An initial 
proposal for how this might be utilised is now discussed. 

5.3 Self Organising Application-Level Routing - SOAR 

We propose a recursive approach to this, based on extending ideas from RLC[14] 
and SOT[13], called Self Organised Application-level Routing (SOAR). The idea 
is that a SOAR does three tasks: 

1. Exchanges graphs/maps [8] [9] with other SOARs in a region, using traffic 
measurement to infer costs for edges in graphs to re-define regions. 

A region (and there are multiple sets of regions, one per metric), is informally 
defined as a set of SOARs with comparable edge costs between them, and 
’’significantly” different edge costs out of the region. An election procedure 
is run within a region to determine which SOAR reports the region graph 
to neighbour regions. Clearly, the bootstrap region is a single SOAR. 

The above definition of ’’significant” needs exploring. An initial idea is to use 
the same approach as RLC[14] - RLC uses a set of data rates distributed ex- 
ponentially - typically, say, 10kbps , 56kbps , 256Kbps , 1 . 5Mbps , 45Mbps 
and so on. 

This roughly corresponds to the link rates seen at the edge of the net, and 
thus to a set of users possible shares of the net at the next ’’level up”. Fine 
tuning may be possible later. 

2. SOAR uses measurement of user traffic (as in SOT[13]) to determine the 
available capacity - either RTCP reports of explicit rate, or inferring the 
available rate by modelling a link and other traffic with the Padhye[6] 
equation provide ways to extract this easily. Similarly, RTTs can be esti- 
mated from measurement or reports (or an NTP proxylet could be con- 
structed fairly easily). This idea is an extension of the notion proposed by 
Villamazar[10] using the Mathis[5] simplification, in ”OSPF Optimized Mul- 
tipath”, p < {MSS/{BW * RTT)f 

A more complex version of this was derived by Padhye et al. : 

B = min{——, = ■= ) (1) 

nil _^ 32 p 2 ) 
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Table 1. Terms in the Padhye TCP Equation 



Wm ; Maximum advertised receive window 
RTT ; The Round Trip Time 
b ; the number of packets acknowledged by 1 ACK 
p ; the mean packet loss probability 
B ; the throughput achieved by a TCP flow 



RTT Is estimated in the usual way, if there is two way traffic. RTTi = 
RTTi * alpha + (1 — alpha) * RTTi-i It can also be derived using NTP 
exchanges. 

Then we simply measure loss probability (p) with a EWMA. Smoothing 
parameters (alpha, beta for RTT, and loss averaging period for p) need to 
be researched accurately - note that a lot of applications use this equation 
directly now [7] rather than AIMD sending a la TCP. This means that the 
available capacity (after you subtract fixed rate applications like VOIP) is 
well modelled by this. 

3. Once a SOAR has established a metric to its bootstrap configured neighbour, 
it can decfare whether that neighbour is in its region, or in a different region 
- as this continues, cfusters wifi form. The neighbour wifi report its set of 
’’neighbour” SOARs (as in a distance vector algorithm) together with their 
metrics (strictly, we don’t need the metrics if we are assuming all the SOARs 
in a region are similar, but there are lots of administrative reasons why we 
may - in any case, a capacity-based region will not necessarily be congru- 
ent with a delay-based region. Also, it may be useful to use the neighbour 
exchanges as part of the RTT measurement). 

The exchanges are of region graphs or maps - each SOAR on its own forms a 
region and its report is basically like a link state report. We should explore 
whether the SOAR reports should be flooded within a region, or accumulated 
as with a distance vector. 

A graph is a flattened list of node addresses/labels, with a list of edges for 
each node, each with one or more metrics. 

Node and Edge Labels are in a URL-like syntax, for example 

soar : //node-id . region-id. soar-id.net and an Edge label is just the far 

end node label. 

Metrics are <type, value> tuples (ASCII syntax). Examples of metrics 
include: 

— A metric for delay is typically milliseconds 

— A metric for throughput is Kbps 

— A metric for topological distance is hop count 

— A metric for topological neighbour is IP address/mask 
As stated above, a SOAR will from time to time discover that a neighbour 
SOAR is in a different region. At this point, it marks itself as the ’’edge” 
of a region for that metric. This is an opportunity for scaling - the SOARs 
in a region use an election procedure to determine which of them will act 
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on behalf of the region. The elected SOAR (chosen by lowest IP address, or 
perhaps at the Steiner centre, or maybe by configuration), then pre-fixes the 
labels with a region id (perhaps made up from date/time and elected SOAR 
node label. 

soar : //node-id .region-id. soar-id.net/metricname 

soar : //node-id .region-id. region-id. soar-id.net/metricname 

soar : //node-id .region-id. region-id. region-id. soar-id.net/ 



6 Current Status 

We have a prototype system “funnelWeb 2.0.1” [32], which supports the deploy- 
ment of proxylets. We are running EEPs at a number of Universities in the UK 
as well as at UTS. We have a very basic combined discovery and routing prox- 
ylet which supports the notion of proximity. A DNS name can be given to the 
routing proxylet and a EEP close to the name will be returned. The first cut at 
this problem just matches on DNS names. This very crude approximation works 
surprisingly well, for example it is able to find a EEP close to a web server in 
our text compression example. 

We have a test page at 

<URL:http://dmir. socs.uts.edu.au/projects/alpine/routing/index.html> . One 
of the advantageous features of writing the discovery and routing entities as 
proxylets has been that we can totally redeploy the whole infrastructure in a 
very short time. 

We are in the process of rewriting our web cache proxylet [33], to make use 
of the routing infrastructure. An implementation of the RTT and bandwidth 
estimators for SOAR is underway. 



7 Related Work 

Active Networks has been an important area of research since the seminal paper 
by Tennenhouse et al.[2] (Some people would suggest that this work was preceded 
by the Softnet [1] work). This paper is based on work in a research project which 
more oriented towards active services [3] [4], which has emerged as an important 
subtopic though the Openarch conference and related events. 

In the current work, we are attempting to address problems associated with 
self-organisation and routing, both internal to Active Services infrastructure, as 
well as in support of specific user services. To this end we are taking a simi- 
lar approach to the work in scout [23], specifically the joust system[24], rather 
than the more adventurous, if less deterministic approach evidenced in the Ants 
work[20][21]. Extension of these ideas into infrastructure services has been car- 
ried out in the MING Project[ll] and is part of the RTF framework (e.g. Perkins 
work on RTP quality[12]. 
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Much of the work to date in topology discovery and routing in active service 
systems has been preliminary. We have tried to draw on some of the more recent 
results from the traditional (non active) approaches to these two sub-problems, 
and to this end, we have taken a close look at work by Francis[19], as well as the 
nimrod project [8]. 

Our approach is trying to yield self-organising behaviour as in earlier 
work[13][21][25], as we believe that this is attractive to the network operator 
as well as to the user. 

There have been a number of recent advances in the area of estimation of 
current network performance metrics to support end system adaption, as well as 
(possibly multi-path) route selection, e.g. for throughput, there is the work by 
Mathis [5], Padhye[6] and Handley [7], and for multicast [14], and its application in 
routing by Villamizar[10]. more recently, several topology discovery projects have 
refined their work in estimating delays, and this was reported in Infocom this 
year, for example, in Theilman[15], Stemm[16], Ozdemir[17] and DufHeld[18]. 



8 Future Work 

The next stage of the research is to experiment by implementation with different 
discovery and routing exchange mechanisms. Implementing these as proxylets 
makes experimental deployment relatively simple. We already have a small (in 
node numbers) global network of EEPs which permits live experimentation. We 
are hoping that more EEPs can be located in different countries and continents 
in the near future (the authors are happy to talk to any groups that may be 
willing to host EEPs). 

As mentioned above, we already have implementations of bandwidth and 
RTT extimators for SOAR underway. These are building on previous work that 
has implemented a reliable muticast protocol in the form of proxylets [14]. Like- 
wise we intend to build service proxylets based on our Application Layer Routing 
infrastructure. These are likely to implement multicast for media streaming, and 
link to related work that we have underway that addresses caching within this 
infrastructure [33]. 
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Abstract. The goal of this paper is to show the benefits of using re- 
flective techniques and meta-programming in the context of active net- 
works, i.e. networks where packets may contain code which programs 
the network’s behavior. By having separate base-levels and meta-levels 
it is possible to better structure services built with mobile code. In this 
paper we propose an active network node architecture supporting such a 
meta-level and we describe a prototype allowing to integrate meta-code 
at load-time. Structural reflection is used to modify the behavior of the 
active service when installed in the node. 



1 Introduction 

Reflection is a powerful mechanism. A well-known example is the Java intro- 
spection API (java. lang. reflect) that allows the exploration of classes struc- 
ture, notably to dynamically obtain information about the defined methods, 
fields, interfaces, etc. Such mechanism provides great benefits for debuggers, code 
browsers and software composition tools. In systems based on code mobility, it 
allows to dynamically explore the code that is pulled, in order to extract the nec- 
essary information, to bind the code to a given interface, for example. However, 
introspection is only one restricted aspect of a reflective system that provides 
more interesting and powerful possibilities in order to better structurate and 
compose applications, notably by allowing to have a clean separation between 
orthogonal aspects of the application. This mechanism has been applied in clas- 
sical distributed systems to handle orthogonal aspects such as fault-tolerance [9] 
[22], distributed execution [25], persistence [16], security [23] [1], and to integrate 
them in different kinds of applications. This so-called separation of concerns is 
of great interest to software engineers since it makes design and maintenance of 
complex software much easier - and applications built with mobile code technol- 
ogy truly are complex pieces of software. 

Reflection allows a computational system to think about itself [18], giving the 
possibility to enhance adaptability and to better control the applications that 
are based on it. For this, two different levels are defined: a base-level related 
to the functional aspects i.e. the code concerned with computations about the 
application domain, and a meto-Zeve/ handling the non-functional aspects, i.e. the 
code supervising the execution of the functional code. A meta-object contains 
all the information of a base-level object and is able to control the execution 
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and behavior of the associated base- level object. The interactions between the 
base-level and the meta-level are governed by a Meta-object protocol (MOP) [13]. 
Thus, it is possible to manipulate base-level entities and even redefine how base- 
level entities are executed. Such architectures enable the development of highly 
flexible programs that can manipulate the state of their own execution. We claim 
that the combination of mobility and meta-level manipulations gives a higher 
level of control and opens up interesting new possibilities. Reflection and meta 
architectures are means to achieve adaptability and mobile code technology gives 
extreme flexibility regarding distribution. 

In active networking, where applications can adapt the network to particu- 
lar requirements using mobile code, reflection is an interesting mechanism that 
can be exploited to dynamically integrate non-functional code to a running ac- 
tive network service. An increasing number of algorithms used in classical net- 
work models or classical distributed systems have been adapted to take into 
account benefits of mobile code (e.g. “active” multicast [17], “active” and adap- 
tive routing[32] [15], etc.). Thus, the extreme flexibility of the active model is 
exploited, but on the other hand, the complexity of software design is increased. 
As a consequence, the composition of active services becomes very difficult and 
service designers integrate in the service code some aspects that are not directly 
related to the main functionality of the service itself. For example, tracing the 
activity of active packets and analyzing how they interact with the different 
execution environments is a non-functional aspect that cross-cuts the original 
design of the service and that is often integrated in several parts of the software. 
Furthermore, the insertion of such code, in most of cases implies stopping the 
service execution, integrating the modifications, recompiling and re-deploying 
the service over the network. Reflection gives a clean solution for structuring 
services in order to separate those orthogonal aspects. 

Dynamic integration of non-functional aspects is a notion that can be ex- 
ploited in a context of code mobility. In this paper we propose an active node 
architecture supporting a meta-level, the goal of which is to give the developer 
a better control over active network services. One of the major advantages of 
using this approach is that the information obtained from the execution of a 
service can be gathered and combined with similar information handed over sev- 
eral execution environments at the meta-level in order to enhance the overall 
service management, independently from the base-level service code. This leads 
to the possibility of modifying the behavior of the distributed service as a whole. 
For our purpose, which is to incorporate non-functional code into an already 
deployed service, we describe alternative approaches making such manipulations 
possible by resorting to meta-programming techniques and meta-level manipula- 
tions. We investigate ways to control the execution of mobile entities by adding 
a meta- level that supervises services executing in the base- level environment. 

The rest of the paper is structured as follows: Section 2 introduces the notion 
of active network service and some consideration in their design (taking into 
account functional and non- functional aspects). Section 3 describes an active 
network node architecture supporting a meta-level and how different components 
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interact. It also describes a prototype implementation allowing only some limited 
functionalities. Finally, very simple examples using meta-objects are described. 
Section 4 discusses related work. Finally, Section 5 concludes the paper. 

2 Active Network Services 

In Active Networking, network packets are active entities (called active packets) 
that contain both data and code [27]. This code can be executed in the active 
network nodes, also called execution environments (EE), that the packets cross, 
making possible the customization of the processing performed on packet fields. 
They can also modify and replace a given node’s packet processing functions, 
enabling computations on user data inside the network and the tailoring of those 
functions according to application-specific requirements. Thus, we define an ac- 
tive network service as a combination of different kinds of active packets and 
active extensions that are pieces of code that are dynamically installed in differ- 
ent nodes. Such combination can be seen as an hybrid between the capsule and 
programmable switch models described in [27]. 

The new kind of emerging network intensive applications, such as stream- 
ing multimedia distribution, on-demand video, etc. have pushed the adaptation 
of classical network models and protocols, for example to introduce a reserva- 
tion mechanism, such as RSVP iu the IP world [33], that allows to provide a 
higher Quality of Service (QoS). The active network service notion fulfills such 
requirements allowing the active application to implement its own reservation 
mechanism, data distribution, routing, etc. giving higher flexibility for this kind 
of applications e.g. allowing to optimize bandwidth consumption by avoiding to 
send a high amount of data over links that have low bandwidth capacity. 

2.1 Service Design and Separation of Concerns 

Designing active services can be a complex task because several aspects related 
to the underlying active model must be considered e.g. how to distribute and 
deploy active extensions, what functionalities must be implemented in active 
extensions and how active packets interact with them. In addition, the problem 
if the design cross-cut is observed when the code that is related to one aspect 
(e.g. synchronization, security checks, tracing, etc.) is spread in several parts of 
the application and any modification of it implies modifications in several parts 
of the service components or even a complete redesign of the application. Such 
aspects are considered as non-functional or orthogonal aspects since they are 
not directly related to the main functionality of the service. The idea is then to 
define the non-functional code as meta-objects that can be applied to the base 
service, allowing the service to be implemented without taking into account 
the non-functional aspects. This separation of concerns allows to develop both 
parts independently i.e. there is no need to modify the service code if the non- 
functional code is modified and conversely. Thus, it is highly desirable to identify 
non-functional aspects in order to help the designer to built services with a clear 
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separation between functional and non- functional aspects. The difficulty resides 
in the fact that (a) non-functional aspects depend strongly on the context in 
which the application is executed and of course on the application itself, and (b) 
some non-functional aspects are not known at design time and appear afterwards. 

The integration of non-functional code can be done (a) before the service 
deployment at compile-time, (b) during the service deployment at load-time or 
(c) after the service deployment at run-time. The first solution is less interesting 
in the context of active networks, since this approach will settle the service and 
no adaptation will be possible in the different EEs. The latter solutions are 
more interesting since they allow to integrate non-functional code when mobile 
code is loaded, installed and is running on EEs. This means that a service can 
be deployed in the active network and that the non-functional code can be 
dynamically integrated without any modification to the service. The service code 
is designed as if the target EE has no meta-level support at all. 

In the following, we describe an active network architecture that fulfills those 
requirements and a prototype implementation based on a load-time approach. 



3 An Active Network Node Architecture 

The active node architecture supporting the integration of non- functional code 
is composed of three different levels (sec Fig. 1): the active packet level, the 
base service level and the meta service level. The active packet level handles 
only incoming active packets and performs a customized forwarding processing. 
Base service level contains both active extensions and active packets and handles 
their execution. The base service and the active packet level together provide 
usual functionalities found in other active network systems. What is different, is 
the introduction of the meta-level containing meta-objects (that we call meta- 
managers). They take control of the execution of base service level entities. Meta- 
objects are executed in their own meta-level environment. This EE provides the 
necessary mechanisms to bind meta-managers to the underlying base-services 
and capture their activity. 

At active packet level, a selector associates two different actions to the in- 
coming active packets: (a) the active packet stays at this level and a packet 
forwarding processing is applied to it. Such processing can be customized as a 
particular active extension and (b) the active packet is passed to the service 
level, where the active packet code is loaded and executed. The running active 
packet can interact with the active extensions that are installed and even trigger 
the installation of new extensions. The active packet can also migrate after its 
execution in the base service level. 

Active extensions are stationary entities that are loaded on demand onto the 
base service level. They react to active packet interactions and can implement 
complex functionalities such as modification of data transported by active pack- 
ets, modify the active packet code or extend the default forwarding mechanism 
of the node. The active extension code can be loaded from a given code server 
or from another active network node. 
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Meta-managers are also installed dynamically and are associated to services 
or active packets. By default, no integration of meta-managers is performed and 
the service is executed normally. This integration depends on the service design 
that specifies which active extension code or active packets need to be controlled 
at meta-level. The resulting composition of a base-level service with its meta- 
code leads to a new entity that can be seen as a whole. 

Service activities are captured and analyzed at meta-level. Such manipula- 
tions are implemented in the meta-managers and depend on the service func- 
tionality. Neither active packets nor active extensions are aware of the associated 
meta services, i.e. active packets and active extension code is the same as it was 
running in nodes that support no meta-level. The node architecture allows meta- 
managers to exchange information with other meta-managers in other nodes. 
Under specific conditions meta-managers can also migrate following an active 
packets flow for example. This depends however in the applicability of the meta- 
program. In general, a meta-program can be applied to different applications, 
but in some cases, service-dependent requirements must be taken into account. 

In the case of a meta-architecture supporting run-time insertion of meta- 
objects, the execution environment integrates mechanisms to manipulate the 
base-level objects and modify their representation in order to integrate meta- 
managers. This approach is of course very powerful because it allows dynamic 
insertion of meta-objects that take complete control of the execution of base- 
level objects and also permits meta-objects to be detached at run-time. However, 
such architectures are very difficult to implement and in the case of mobile code 
environments, they can introduce higher security risks, since internal and low- 
level mechanisms of the EE itself are manipulated by meta-objects. 

We have found that the approach based on load-time integration of meta- 
objects is more adapted for environments based on mobile code paradigm. The 
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main reason is that the integration is done before the execution of the resulting 
code in the EE, so that some verifications can be performed at this level. Eor 
instance, integrity checks on the resulting modified code or verifications concern- 
ing the base service and the meta-object code origin. Of course, this approach 
is more restricted than a full run-time solution, since the integration of meta- 
objects is performed only once during the load-time and meta-objects cannot 
be separated from base-level entities afterwards. In the following we describe a 
prototype that is based on this approach. 

3.1 RANN: A Reflective Active Network Node 

RANN is an active node prototype based on Java. It shows how meta-managers 
can be transparently integrated into the base-level service code and how inser- 
tion of non-functional code is performed. It is based on load-time structural 
reflection providing limited meta-object integration functionality. The reasons 
for this choice are explained here. 

Different approaches can be followed to add sophisticated meta-level manipu- 
lations in Java. Five possibilities can be considered: (a) integration of meta-level 
code during the compilation. This approach Is called compile-time MOP resolu- 
tion [26]; (b) using a customized (modified) Java Virtual Machine (JVM) that 
supports a meta-level [8] [7] [21]; (c) customizing a Java Just-In-Time compiler to 
enable reflection as It converts the bytecode into native machine code [19]; (d) 
dynamic generation of wrapper classes that allow interception of method Invo- 
cation [30] [3]; (e) generating a modified class, while loading the original class 
bytecode and before it is instantiated inside the JVM [31] [5] [4]; 

Even if solutions (b) and (c) are powerful, they have not been considered 
because they are not portable. They need modifications of the JVM or hardware- 
specific native code extensions. 

Implementations based on (a), with compile-time reflection, can be used to 
integrate meta-level code If the service designer performs such Integration before 
the service Is distributed and installed. This allows to automate the creation 
of different versions of service code depending on specific requirements, but it 
doesn’t allows more dynamic integration that will be necessary in mobile code 
environments. Even if this solution allows a clean separation of both base and 
meta-service, it is harder to manipulate different versions of the service code 
containing minimal modifications, and install them In the active nodes. Compile- 
time techniques require also the service source code and In general needs some 
(minimal) modiheations of this code (such as directives used as entry points, to 
allow the code preprocessor to Integrate the meta-program). For those reasons, 
we have chosen to show a more dynamic solution based on load-time, that is more 
adapted for mobile code applications. However, combinations of all techniques 
must not be excluded and have to be studied In future work. 

The current Implementation of the Reflective Active Network Node (RANN) 
Is based on the Javasslst framework [4], where load-time and compilation tech- 
niques are exploited. Javassist does not require any modification of the JVM 
and enables load-time structural reflection for Java applications. This means 
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that the service bytecode that is loaded on the active node can be modified in 
the following ways: the method call mechanism can be tuned, class attributes 
can be acceded, methods can be added/removed, new classes can be created from 
scratch, the inheritance mechanism can be modified and even the instantiation 
mechanism can be changed. This technique can be applied since the structure 
of the Java bytecode is adapted to complex manipulations and the Java class- 
loading mechanism allows such extensions. Javassist provides support for more 
specialized user-defined class-loaders performing the bytecode modification to 
insert meta-objects when loading new services on the node. The “assisted” byte- 
code modification is actually based on approaches (d) and (e). Javassist compiles 
the modified classes at run-time without resorting to any external compiler and 
produces immediately loadable bytecode. Javassist is a very powerful framework 
that hides low-level bytecode modifications giving the programmer the possibil- 
ity to concentrate on application design. 

RANN allows active packets to install, load and use services that are trans- 
parently controlled by the meta-level integrated with Javassist. Fig. 2 shows 
some relevant components of the node. For example, the ServiceMediator is the 
bridge between active packets and services. It implements the protocol allowing 
to install, load and use services. The Serviceinterface contains the minimal set of 
method definitions that allows active packets to use a given service. Every service 
must implement this interface and adequately customize its own methods. 

The diagram on Fig. 2 also shows how the active node components interact 
with the dynamically installed code. In this example, we suppose that the service 
is implementing its own routing algorithm and that we want to modify the be- 
havior of the routing service used by active packets. This modification is handled 
at meta level. 

RANN services are separated in three different parts that are compiled and 
loaded separately: 

— The service code: This code implements the base-level service. This cor- 
responds to the RoutingService in the example. 

— The meta-object: The second type of code that is loaded in the node cor- 
responds to all classes implementing the non-functional aspects. In this case, 
we have composed two different meta-objects (MetaManager and Principal- 
MetaManager) that can manipulate information generated by the execution 
of the associated base-level service. 

— The code modifiers: This code is responsible to perform complex bytecode 
manipulations that follows a policy specified by the service designer. They 
can be seen as service-specific compilers that are dynamically installed in 
the active node. They “assist” the loading process of the service code in 
order to integrate the non-functional code. Examples of such integrators are 
ReflectiveLoader and NetReflectiveLoader. 

This separation allows to modify independently the code and also to cus- 
tomize services that are installed in the active node. Eurthermore, the code 
modifiers themselves can be loaded only if meta-objects must be inserted. This 
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Fig. 2. UML class diagram of the Reflective Active Node 



means that by default, services are loaded without any meta-object. Once the 
code modifier is installed, it will take the responsibility to load, modify and in- 
stall the services in the node. Of course, it is possible to specify the services 
that must not be modified and that must be installed normally. A more detailed 
description of service modification process is provided in the following (see fig. 3). 

When the service code is asked to be installed on RANN nodes, (i.e. the 
RoutingService) (1), the ServiceMediator will interpret the loading policy that 
is specified for each part of the service. Thus, depending on this policy, the 
service loading process will be delegated to the ReflectiveLoader that will be 
loaded in the node (2). This loader will instrument all the loading process and 
code adaptation. In this example, two meta-objects are associated to the Rout- 
ingService (MetaManager and PrincipalMetaManager). The loading process of 
those meta-objects can be performed by the ReflectiveLoader itself or can be 
delegated to another loader. All the bytecode of the service and meta-objects 
is retrieved by the loader from some source (potentially from another EE or 
a code container) (3). The bytecode is not yet inserted in the JVM (i.e. the 
classes are not resolved). CtClass (Compile-time class) objects are created by 
Javassist for each retrieved class. Those objects contains the representation of 
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the bytecode of the classes. Classes bytecode contains enough symbolic infor- 
mation to construct those representations (4). The ReflectiveLoader will start 
the code manipulation process by modifying the representation (CtClass) of the 
original RoutingService. For example, a new interface (Metalevel) is added and 
methods of the service are redirected to meta-objects (trapMethodCall). Such 
modifications are performed by keeping consistent the modified class (5). Once 
the modifications and consistency checks are performed, the code is internally 
compiled into a modified version of the RoutingService (6) Finally the extended 
service is inserted in the JVM. Thus, the code that uses the service will not be 
aware about meta-objects and methods calls will be transparently redirected to 
them (7). 

The modifications performed to the service code, such as incorporating new 
interfaces at load-time (e.g. the Metalevel interface), avoid the definitive settle- 
ment of the service interfaces at design. For example, the service developer does 
not have to implement the service code and take into account all the interfaces 
that will be required after transformation. This ensures that service code and 
meta-objects can be compiled separately and it is no longer needed to prede- 
fine several interfaces that are complex and that are not related to the main 
functionality of the service. 



Tracing Service Execution at Meta-Level. In this very simple example, we 
show how it is possible to add some kind of debugging code to a service that is 
installed on different EEs. In this case, it is possible to specify which parts of 
the service are interesting to be traced. This is typically an aspect that cross- 
cuts an application design. Thus, the associated meta-object will send a message 
for each activity that is meaningful for debbuging or will store this information 
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that can be later collected for statical analysis. The incorporation of such code 
can be performed by adding a redirection to the associated meta-object of the 
interesting methods of the service. The meta-object will simply send the message 
and eventually call the original service method. Using this mechanism, we are 
able to add and remove such debugging information at meta-level, without any 
intervention on the base-level service. 

Resource Management at the Meta-Level. Resource management is a 
very difficult problem in systems built with mobile code (active network, mobile 
agents, distributed computing). Some systems address this problem by using 
notions like energy[2], money [29] or complex resource reservation mechanism to 
limit resource consumption. Here, we do not provide a solution to this complex 
problem, but we describe how it would be possible to manipulate resource man- 
agement information at meta-level allowing the service designer to better adapt 
the service to particular requirements. We suppose that the EE provides the 
necessary mechanisms to allow resource management. 

By handling such non-functional aspect at meta-level, two major advantages 
can be obtained: (a) it is no more necessary to handle explicitly this information 
in the service code level and (b) the resource management code can be changed 
independently of service modifications. To achieve this, services loaded in RANN 
will be modified by changing the instantiation mechanism. This is performed by 
replacing the new statement with a dynamically added meta-object. Javassist 
allows this kind of powerful manipulation so that the creation of an object can 
be customized. 

The meta-object can perform verifications on the current load of local re- 
sources and instantiate different versions of the real requested service. The ser- 
vice functionality must be the same for all versions. For example, one version 
can distribute some computation over several EEs, resulting on less resource 
consumption on the local node, but more complex code handling distribution, 
and the other version performs all computation locally, resulting in less complex 
code, but higher resource consumption. 

We do not discuss resource management or accounting in Java itself. The 
current implementation of the JVM does not provide such mechanisms. However, 
we can cite JRes [6], which provides a resource accounting interface for Java 
programs. Interestingly, JRes follows and approach similar to Javassist, since it 
is also based on load-time bytecode rewriting techniques. 

4 Related Work 

M0 Messengers This work has been inspired on previous work on M0 active 
network/mobile agent system [28]. In this system, each active packet (called 
a messenger) is able to carry code that can be applied to its own data field. 
Moreover, some messengers may install procedures (services) that can later be 
used by other messengers which carry only a minimal code to load and apply the 
procedures to their own data field. The M0 programming language is similar to 
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PostScript and has additional primitives for synchronization of threads, access 
to shared memory, accounting, secured installation and publication of services 
and creation of new messengers on the same platform (i.e. the messenger EE) 
and over the network. The messenger paradigm allows meta-level manipulations 
since messengers can change their own code and also the service code stored in 
local repositories. Using those capabilities it is possible to install messenger code 
compilers inside the active node that modify their own code and change their 
behavior [12]. 



Pronto The Pronto active network platform[10][ll] architecture follows the 
control-on-demand paradigm. Control code is installed asynchronously and there 
is an strong separation between the generic forwarding engine and the user in- 
stalled control programs executed in an EE. The EE consists on a virtual machine 
and an environment manager. The environment manager is responsible for au- 
thenticating, installing and managing the programs executed in its environment 
and provides universal programmable messaging at services level. This archi- 
tecture is very similar to ours. However, the Pronto platform does not allows 
customization of installed programs. In fact in this architecture, the user defined 
programs implement control themselves. 



Maude The Maude [20] active network system is a reflective system that uses 
reflection for a different goal. Reflection is used to support analysis of formal 
specifications based on equational and rewriting logic specifications and pro- 
gramming. Maude is used to develop new security models and proof techniques 
that are needed for secure active networks. Its goal is to provide the active net- 
work architecture with a formal language technology that allows formal transfor- 
mation of specifications into mobile code and to support verification of security 
properties. 



Aspect Oriented Programming Aspect Oriented Programming (AOP) [14] 
is an emergent research field that takes a more general approach than meta- 
architectures to cope with the crosscut of design problem. In AOP, multiple 
aspects can be described in their own special-purpose high-level aspect language 
that are finally weaved with the application. The result is a modified version 
that integrates the necessary code to handle the non-functional aspects. Aspect 
languages are composed with the base-level at compile-time. This implies that 
the possibility of run-time composition, that is a a more powerful feature of 
run-time and load-time meta-object protocols, is lost. Compile-time reflection 
is in fact also a powerful mechanism used in some meta-architectures. But as 
stated before, this technique limits the run-time adaptation in a mobile code 
based environment. Ideally, run-time integration of meta-objects is the best way 
to obtain service adaptability, but for the moment no EE provides this kind of 
integration. 
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CORRELATE In [24] the authors point out the problem of defining applica- 
tion specific policies with respect to non-functional behavior. That means that 
even if meta-architecture allows a clean separation and can be applied to sev- 
eral applications, when some applications need special treatment at meta-level, 
it is not easy to have a complete independence. In those cases, standard meta- 
programming does not provides a satisfactory solution. The authors propose to 
treat such requirements at a higher level by separating the base-level, the meta- 
level and the policies. This approach is called non-functional policies. The idea 
is to specify base-level specific requirements and generate a policy object. At 
run-time, policies are interpreted by meta-programs. Thus, the separate policy 
object is linked to both application object and to the meta-object. Meta-program 
consults the policy objects whenever application-specific information is required. 
This separation is similar to the way that we propose to structurate the services, 
i.e. the extension of user-defined loader corresponds somehow to the policy ob- 
jects. The difference is that in CORRELATE a high-level definition language 
allows to describe the policies that are compiled and transformed in concrete 
policy objects. In our case, each user-defined loader is in fact programmed sep- 
arately and code modification is handled at low-level. 

5 Discussion and Conclusion 

The ideas presented in this paper show how it is possible to enhance adaptability 
of service components that are injected on active nodes. We have used load-time 
structural reflection to modify the code that is installed in active nodes. This can 
be applied to both active extensions and active packets. The kind of manipula- 
tions that we propose here can be adapted and used in other Java-based active 
nodes. In fact customizing loading mechanism is a well known operation that 
is widely used, but we think that it is possible to better exploit it to adequate 
the services that are installed in active nodes. The possibility of developing and 
injecting separately both the code modifiers and the meta-programs is interest- 
ing for service design. Of course performance issues, concerning the dynamic 
bytecode modification (compilation time) and the multiple redirections that are 
added to the application, are important. However, we think that allowing such 
kind of integrations, that are applied only to a limited part of the service, is more 
interesting than integrating the same code in all the service implementation that 
is installed everywhere, like in the compile-time approach. 

The current implementation of RANN, limits the adaptation of service so 
that we are not able to manipulate or control the execution of the method 
itself and the modifications to service behavior are limited, but it shows that 
load-time structural reflection can be applied to service adaptation in active 
networks. However, using load-time reflection poses some problems. Even if no 
source code is needed to perform the dynamic adaptation of the service in the 
active node, it is necessary to know precisely how service code is structured. Even 
if the Javassist framework hides complex low-level bytecode manipulation, it does 
not provide higher abstraction to automation of meta-programming. Thus, each 
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service adaptation code must be carefully programmed. We have not studied 
security implications of service adaptation performed directly at load-time. We 
think however that, even if it seems that modifying service code in the EE opens 
the way to some kinds of attacks, the problem is not very different to a static 
compilation process or security implications that are inherent to the usage of 
mobile code. Thus, security problems stay but are located at another level. 

This paper describes the need for flexible structuring mechanisms in the world 
of active networks. Meta-level manipulations is a promising way to achieve this, 
especially because of the resulting openness, which enables the programmer to 
customize the structure of the system. In particular, we wanted to show: (1) that 
it is necessary to be able to add non-functional aspects to already deployed active 
network services, and (2) that a solution, with a clean layering of orthogonal 
responsibilities, is made possible by meta-level architectures. 
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Abstract. This paper introduces the notion of self-specializing mobile 
code, a program that is transmitted over a network and automatically 
customizes itself to its destination. The benefits of this approach, higher 
performance without compromising safety, is accomplished through a 
combination of program specialization and program verification. We 
show how self-specializing mobile code can be used to create an adaptive 
network service, i.e., a service that is transmitted to a destination, where 
it adapts itself with respect to properties it detects on the destination 
node. 

We have designed and implemented an active network, called FAST- 
net, that is Fast, Adaptive, Safe, and Typed. FASTnet is based on the 
PLANet active network, but accommodates adaptive network services. 
Experimental results show adaptive network services can run over 4 times 
faster than normal network services without compromising the safety of 
the network. 



1 Introduction 

We use self-specializing mobile code to implement adaptive network services. 
In this section, we introduce self-specializing mobile code and describe its two 
key programming language technologies, program specialization and program 
verification. We then explain how self-specializing mobile code can be used to 
implement adaptive network services. 



1.1 Self-Specializing Mobile Code 

Mobile code refers to programs that are written on one machine, transmitted over 
a network, and executed on another machine. Mobile code has recently gained 
widespread popularity since it greatly simplifies code distribution, making more 
software available to more people. The World Wide Web, and its straightforward 
browser interface, enables users to easily or even automatically download code 
from all over the world and execute it on their own machine. Examples include 
browser plug-ins and Java applets. Since users may not trust the mobile code or 
may not even know that they have downloaded it, adverse effects can be avoided 
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by first verifying the code before executing it. For example, certain types of 
viruses can be caught this way. 

The “write once, run anywhere” notion of mobile code greatly reduces the 
burden on the programmer. Instead of writing and maintaining a different pro- 
gram for each different context in which the program may be run, there is only 
one version of the code which can be executed in all of the contexts. In order to 
execute everywhere, though, the program needs to be general enough to work 
correctly in all of the different contexts. This generality typically introduces a 
performance overhead. 

We propose a solution to this conflict between generality and performance 
using a technique we call self-specializing mobile code. Self-specializing mobile 
code allows the programmer to write a single program that can be transmitted 
over a network to a destination, where the program safely adapts itself to its 
context. Instead of transmitting regular code over the network, the key is to 
transmit a safe code generator. A safe code generator consists of transformations 
that enable it to produce customized code and types that enable its safety to be 
guaranteed. This novel technique is achieved by combining two key programming 
language technologies, program specialization and program verification, each of 
which is described below. 



Program Specialization Program specialization, also known as partial eval- 
uation, is a technique that optimizes a program with respect to the context in 
which the program is executed [8]. In particular, all computations that only de- 
pend on invariants in the context are performed during a first phase, producing 
an optimized program which is executed in a second phase. Dramatic speedups 
have been shown for a wide range of applications, including programming lan- 
guage interpreters, graphics tools, and operating system components [2,13,16]. 

Run-time code generation is a type of program specialization that generates 
executable code [1]. The code generation process is lightweight, since expensive 
analysis and compilation phases are performed off-line during a preprocessing 
phase. This technique can be used to create self-specializing mobile code — a 
program that is transmitted over a network and automatically customizes itself 
to its destination. Specifically, the transmitted program accesses values in the 
destination’s context and generates optimized code based on these values. 



Program Verification Program verification, a technique that analyzes a pro- 
gram to determine properties about the program, is the other key component 
in self-specializing mobile code. First of all, since the source of the mobile code 
may not be trusted, the user may wish to determine that the program will not 
perform any undesirable behavior during its execution. Misbehaving programs, 
such as viruses, could delete files from the hard drive, access confidential infor- 
mation, or self-replicate themselves an exponential number of times. Other types 
of errors that may cause a program to crash, e.g., buffer overflows or null pointer 
dereference, can also be detected. The Java byte-code verifier, for example, is 
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intended to detect these types of behaviors ahead of time, so that the user can 
safely execute any program that passes the verifier. 

Even if a program is trusted, though, there are additional motivations for 
using program verification. For example, dynamic linking can be problematic 
due to incompatibilities between a dynamically loadable library and a running 
application. Proper dynamic linking can be ensured by requiring the dynami- 
cally linked program and the running program to contain interfaces specifying 
their imports and exports, which are checked at link time [6]. As well, a self- 
specializing program is much more complicated than a normal program, and is 
therefore much more error prone. Type systems have designed to provide safety 
guarantees about a self-specializing program, e.g., that all of the code that it may 
generate will be well formed [10,15,17]. A technique used by certifying compilers 
verifies properties after the program is compiled but before it is executed, which 
guarantees that the compiler did not introduce illegal transformations and that 
the resulting code is still type-safe [12]. 

1.2 Adaptive Network Services 

Active-network services are good candidates for self-specializing mobile-code. 
Like mobile code in general, it is desirable to implement a single service, which 
could then be deployed on many machines throughout a network. An active- 
network node typically contains many constant values in its state, but these 
values may differ from one node to another. For example, each node has its 
own network address, routing table, etc. These values can be used to customize 
network services, but since the values are all different, they have to be customized 
differently for each node. 

The infrastructure we propose is lightweight and efficient. A network service 
that automatically adapts itself to its destination is self-contained and requires 
no additional compiler support on the destination node. All of the optimizations 
and code generation are built into the network service itself. Also, code genera- 
tion is fast and the code produced is high-quality, which means that the time to 
generate code is quickly amortized after executing the generated code. 

There are a number of different types of invariants that network services 
can exploit: node, protocol, and locality invariants. A node invariant is a value 
that is unique to the machine, its operating system, or applications installed on 
the machine. Network addresses and routing tables, as previously mentioned, are 
examples of node invariants. A protocol invariant is a value that remains constant 
for a certain period of time due to the semantics of a particular protocol. For 
example, TCP is a session-oriented protocol that maintains invariants for the 
duration of a connection. A locality invariant exploits the fact that certain values 
that were seen in the past will likely been seen again in the future. Techniques 
such as code-caching exploit this type of invariant. 
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1.3 Summary 

We have designed and implemented an active network that uses self-specializing 
mobile code to implement adaptive network services. The rest of the paper de- 
scribes the main components in our system and demonstrates how it can be 
used to increase the performance of network services without compromising the 
safety of the network. Section 2 provides an overview of Cyclone, our system 
that produces self-specializing mobile code. In Section 3, we present FASTnet, 
an active network that accommodates adaptive network services. Experimental 
results are described in Section 4, and concluding remarks are given in Section 5. 



2 Cyclone 




Fig. 1. Overview of the Cyclone certifying compiler 



Self-specializing mobile code can be automatically derived from a normal 
program using a special certifying compiler, named Cyclone, that we have de- 
signed and implemented. Cyclone is the only certifying compiler that performs 
run-time code generation. The three phases of the Cyclone system are shown in 
Figure 1. We use C as the input language and x86 executable code as the output. 
Intermediate representations consist of a source language, also called Cyclone, 
and the TAL/T assembly language. A full description of the system can be found 
in [7]. The rest of this section describes these three phases. 

The first phase of our system translates a type-safe C program into a Cyclone 
program. Cyclone is a source level programming language, like C, but it contains 
additional constructs to perform code generation. These constructs specify which 
parts of code will be executed early, during the code-generation, or late, during 
the subsequent execution. Early constructs are those that only depend on values 
that remain invariant in the destination context, while late constructs depend 
on values obtained elsewhere. The translation of the first phase is accomplished 
by analyzing the original program, along with an abstract description of the 
destination context, to produce the Cyclone program. 

An example of this translation is given in Figure 2. Part (a) contains a normal 
C program that performs modular exponentiation. If the exponent and modulus 
values are constant in the destination context, they can be used to generate an 
optimized program in which all of the operations that depend on these values are 
computed. Part (b) shows the corresponding Cyclone program that is generated, 
where special constructs (codegen, cut, splice, fill) have been introduced to 



106 



Luke Hornof 



(a) C code (invariant arguments in italics) 

int mexpCint base, int exp, int mod) 

{ 

int res = 1; 
while (exp != 0) { 
if ((exp & 1) != 0) 

res = (res * base) "/, mod; 
base = (base * base) "/. mod; 
exp »= 1 ; 

} 

return (res) ; 

} 



(b) Cyclone code 

int (int) mexp-gen(int exp, int mod) 

{ 

return codegen( 

int mexp_sp(int base) { 
res = 1; 

cut 

while (exp != 0) { 
if ((exp & 1) != 0) 

splice res = (res * base) "/. fill(mod); 
splice base = (base * base) "/, fill(mod); 
exp >>= 1; 

} 

return (res) ; 

}); 

} 



Fig. 2. C to Cyclone translation 



exploit the constant values. Constructs that will be computed early are shown in 
italics; the rest will appear in the optimized program. In our example, executing 
the loop early generates some number of assignments to res and base that will 
be executed late. Within each assignment, the modulus value is computed early 
and filled in. 

The second phase compiles a Cyclone program into Typed-Assembly Lan- 
guage with Templates, called TAL/T. Types in TAL/T are used to verify that a 
program is type-safe. Templates are the code fragments from which the special- 
ized program will be created, and are the result of compiling to the late Cyclone 
instructions. The early Cyclone instructions are compiled into the glue code that 
instantiates and assembles templates. 

An excerpt of the compiled Cyclone program given in Figure 2 is shown in 
Figure 3. The template for the first template, which simply initializes the variable 
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_mexp_gen : 

CGSTART 

CGDUMP ECX,cdgn_beg$18 
JMP whiletest$21 



codegen ( . . . 

(dump 1st template) 
while . . . 



TEMPLATE_START cdgn_beg$18 . cdgn_beg$19 



cdgn_beg$18 



; (1st template) 
; int res = 1; 



MOV [ESP+4] , 1 
TEMPLATE_END cdgn_beg$19 



Fig. 3. TAL/T code 



res, is located at address cdgn_beg$18. The glue code, at location _mexp_gen, 
includes the instruction CGSTART for starting a new code generation and CGDUMP 
for dumping the contents of a template. 

The third phase assembles a TAL/T program into a self-specializing mobile 
code program, which consists of x86 machine code and its corresponding type 
information. This program can be sent over a network. When it arrives at its 
destination, the type information will be used to verify that the program and 
any program that it may generate are type-safe and then the program will gen- 
erate a specialized version of the original function. In our example, a modular 
exponentiation fnnction optimized with respect to exponent and modulus values 
found on the destination node will be generated. 



We have determined that active-network services are good candidates for self- 
specializing mobile code. To demonstrate this, we have implemented an active 
network called FASTnet. FASTnet is Fast, Active, Safe, and Typed version of 
PLANet, an active network in which all packets are programs and routers can 
be extended dynamically [3,4,5]. 

All network communication in FASTnet is performed with PLAN programs. 
Each node, therefore, contains a PLAN interpreter that executes PLAN pro- 
grams as they are received. PLAN programs are sent between nodes using 
UDP/IP using a simple ASCII wire representation. 

In order for a node to be able to route packets to other destinations in 
the network, we compute a routing table using a simplified form of link-state 
routing [9]. The first phase consists of a distributed protocol in which nodes 
iteratively exchange information with each other until each node builds up a 
copy of the entire network topology. The second phase consists of each node 
computing the shortest path from itself to each of the other nodes in the network, 
and updating the routing table accordingly. 



3 FASTnet 
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source 




• PLAN interpreter 






• network services 


Cyclone 


network 


• dynamic linker 






• verifier 






• node state 


1 . create 


2. transmit 


3. verify 



4. link 

5. self-specialize 

6. execute 



Fig. 4. Steps in deploying and executing an adaptive network service 



There are a number of services built-in to each node. Networking services 
perform functions such as returning the source address of a PLAN program or 
returning the interface on which a PLAN program arrived. Other services can 
query and update network topology information, and are used to implement the 
link-state routing protocol. As well, general services are also available to perform 
functions such as list-processing and I/O. 

The novel component of FASTnet, however, is its ability to dynamically 
load adaptive network services consisting of self-specializing mobile code. The 
way this works is shown in Figure 4. First, the self-specializing mobile-code 
program is created on the source node, as described in Section 2. It is sent over 
the network by including it in a PLAN program. When the PLAN program 
arrives at the destination node it is interpreted by the PLAN interpreter. A 
special load_service PLAN function verifies the mobile code to ensure that it 
is type-safe and dynamically links it into the destination node. The service then 
reads values from the destination’s state and generates specialized code based 
on these values. At this point, the specialized code can be executed. As well, the 
specialized service is persistent; it is available to subsequent PLAN programs 
that wish to use it. 

4 Experimental Results 

We studied the improvement adaptive network services can provide by imple- 
menting and deploying two different types of services in our active network. 
Experiments were run on a 400 MHz Pentium II PC with 128 MB of main 
memory running RedHat Linux 6.1. This section describes our results. 
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4.1 Public-Key Algorithms 

Public-key algorithms are popular due to their security, convenience, and flex- 
ibility. They can be used to perform a number of different security functions, 
such as encrypting data to guarantee its privacy or signing data to insure its 
integrity [14]. The algorithm works by generating a pair of keys: a private key, 
known only to a particular participant and not shared with anybody else, and 
a public key, published publicly so that it is available to everyone. Encrypting 
data intended for the specific participant is achieved by using the participant’s 
public key; only the corresponding private key can decrypt the data. Signing 
data requires a participant to encrypt data with his private key; everyone else 
can verify his authenticity by decrypting the data using the corresponding public 
key. 

As network security becomes more important, it might be desirable to add 
some type of public-key algorithm to an active-network node. For example, email 
can be encrypted to insure its privacy or a key server may wish to sign the keys 
it distributes to insure their integrity. In this scenario, each node in a network 
would generate a public and a private key, making the former publicly available 
while keeping the latter a secret. 

A public-key algorithm fits well into a framework that accommodates adap- 
tive network services. Each node contains some node-specific data, its private 
key, which is different from every other node, but can be used to optimize the 
public-key algorithm. Creating a self-specializing mobile code program that can 
be transmitted to a node and then specialized with respect to the private key en- 
ables a single generic service to be deployed throughout a network and adapted 
to each node. 




Fig. 5. RSA encryption experiment 
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In order to test this idea, we wrote a normal version of RSA encryption 
and a self-specializing version, and encoded each of them as services in our 
FASTnet active network [14]. The main function that RSA performs is modular 
exponentiation, the function shown in the example in Figure 2. Although the 
example uses 32 bit integer values to simplify the presentation and real RSA 
uses much larger values (our tests were run with 64 byte values), the idea is the 
same. The private key corresponds to the exponent and modulus values and is 
used to specialize the function. 

We assessed the advantage of self-specialization by measuring execution times 
of each of the two different versions. The normal version performs encryption 
in 226 microseconds while the self-specializing version takes 150 microseconds, 
yielding a speedup of 1.5. The self-specializing version, however, requires an 
initial 975 microseconds to generate the optimized version. 

Therefore, the self-specializing version is only advantageous when it is exe- 
cuted enough times to amortize the time spent generating optimized code. We 
plotted this information in Figure 5. From this figure, we see the self-specializing 
version is faster (as seen by the slope of the lines) but requires the initial code- 
generation phase (as seen by the non-zero y-intercept). From this graph, we can 
see that break-even point is around 15 iterations, i.e., if the encryption algorithm 
is executed more than 15 times with the same private key, the self-specializing 
version will be faster. As the number of messages encrypted increases, the time 
spent generating code becomes negligible. 

4.2 Interpreters 

When designing parts of a complex system, such as a network, it is often useful 
to introduce a new programming language. The two main techniques used to 
implement a new language are writing an interpreter or writing a compiler. 
Although a compiler will execute its compiled code faster, an interpreter is often 
implemented because it is much simpler. For example, PLAN interpreters can be 
found in active networks and packet filter interpreters can be found in traditional 
networks [5,11]. 

Program specialization can be used resolve the conflict between the simplicity 
of an interpreter and the efficiency of a compiler [16]. Specializing an interpreter 
with respect to a specific program achieves a result similar to compilation since 
the interpretation overhead is eliminated. This is an example of exploiting a 
locality invariant, since it is often the case that once a program is executed, it 
is likely to be executed again. 

For our experiment, we implemented a byte-code interpreter for a small stack- 
based language. The language includes instructions for manipulating the stack 
(e.g., push, pop), accessing memory (e.g., load, store), performing arithmetic 
operations (e.g., addition, subtraction), and for control flow (e.g., compare, jump 
if equal to zero). 

We measured the performance of a normal interpreter and a self-specializing 
interpreter. The normal interpreter evaluated a 60 instruction program in 10.8 
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Fig. 6. Byte-code interpreter experiment 



microseconds, while the self-specializing interpreter only spent 2.5 microsec- 
onds - a speedup of 4.3. The time spent optimizing the interpreter was 281 
microseconds, which means the break-even point is 35. We plotted this data in 
Figure 6. 

Compared to the RSA encryption results presented earlier, the speedup for 
the interpreter is much higher, although the break-even point takes longer to 
reach. A code-caching technique that saved specialized versions of programs 
would pay off for any program that was executed more than 35 times. 

5 Conclusion 

We have designed and implemented an active network that uses self-specializing 
mobile code for adaptive network services. These services are transmitted over a 
network and can automatically adapt themselves to their destination. Program 
verification is employed to check that untrusted programs do not misbehave, that 
dynamic linking is done correctly, and to ensure that the code generators pro- 
duce well-formed code. Program specialization is used to yield the performance 
improvements. Experimental results show that adaptive network services can 
run over 4 times faster than traditional services without sacrificing any network 
safety. 
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Abstract. Adding programmability to the interior of the network pro- 
vides an infrastructure for distributed applications. Specifically, network 
management and control applications require access to and control of 
network device state. For example, a routing load balancing applica- 
tion may require access to the routing table, and a congestion avoidance 
application may require interface congestion information. There are fun- 
demental problems associated with this interaction. 

In this paper, wo study the basic tradeoffs associated with the interaction 
between an active process and its environment and presenting ABLE-) — h 
as an example architecture. Most notably, we explore two design trade- 
offs, efficiency vs. abstraction and application flexibility vs. security. We 
demonstrate the advantages of the architecture by implementing a con- 
gestion avoidance algorithm. 



1 Introduction 

In active networks [12], network elements, such as routers, are programmable. 
Code can be sent in-bound and executed at the router rather than just at the 
edge nodes. The rationale behind active networks is that moving computation 
from the network edges to its core facilitates more efficient use of the network. 
Many of the suggested applications [17], such as congestion control and adap- 
tive routing, require the active code to be aware of local state information in 
the router. When active networks are used for network management [16,14,6], 
interfacing with the managed router is even more important, because manage- 
ment applications need efficient monitoring and control functions. For example, 
a routing load balancing application may require access to the routing table, and 
a congestion avoidance application may need interface congestion information. 
Efficient and secure access to managed device state is especially needed when 
the managed router is logically, and to a greater extent, physically, separated 
from the management active environment [14]. 

The design of a control/monitoring interface to a router must balance be- 
tween abstraction and efficiency. An interface with a low level of abstraction, 

* This work was done while in Bell-Labs, Lucent Technologies. 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 114-129, 2000. 
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such as Simple Network Management Protocol (SNMP) [15], can be efficient at 
the device level but provides a cumbersome programming infrastructure at the 
application level. Likewise, an interface with a high level of abstraction provides 
simple programming constructs, such as Corba [18], but is often inefficient. An- 
other design tradeoff we address with our system is application flexibility versus 
security vulnerabilities. For example, allowing an active routing application to 
change the forwarding tables at a router may result in global routing instability. 

Our goal is to present a simple and abstract interface to the application pro- 
grammer while maintaining the efficiency of the system which requires visibility 
of operation costs. We address this tradeoff by using a cache and an efficient ab- 
stract design. We balance the latter tradeoff by addressing security at multiple 
levels of our system. 

To make the discussion clearer and more concrete, we demonstrate our de- 
sign by describing novel extensions to the ABLE architecture [13], ABLE-I--I-. 
ABLE is an active network architecture that allows the easy deployment of dis- 
tributed network management (NM) applications. In ABLE-I — h, we optimize the 
interaction between the active process and the router by adding a local cache 
at the device level to store local router data. Caching information helps in more 
than one way to alleviate the load from the core router software where comput- 
ing power is scarce. Obviously, it allows a single data retrieval to be used by 
several applications. In addition, some of the popular information we cache is 
’computed’ from consolidating many different ’atomic’ data units, e.g., a list of 
all neighbors. Consolidating cached items serves several aims: it reduces the load 
from the router, shortens the retrieval time for applications, and simplifies the 
writing of management applications. 

Our interface eliminates the management instrumentation details (e.g. 
SNMP, CLI, or CMIP) from the application layer without sacrificing efficiency. 
In a similar way, we abstract the control interface that allows privileged appli- 
cations to modify the router state. The control capability introduces security 
hazards, which we address by placing security mechanisms such as authenti- 
cation and application classification at various layers in the architecture. Each 
application is authorized to use specific services (read and write) and resources 
(CPU time, bandwidth, memory). Our interfaces check for conformance to these 
restrictions. 

To demonstrate the usefulness of our design, we present an implementation 
of a congestion avoidance application similar to the one suggested by Wang [19]. 
The application constantly monitors the load of the router interfaces. When 
congestion is detected, the algorithm attempts to reroute some flows around the 
congested interface. The algorithm works locally: a router exchanges messages 
with its 2-neighborhood routers (the group of routers that are no more than two 
hops away from it) to find a deflection route (similar ideas were also suggested 
in [3]) for some of the flows that pass through the congested interface. The 
algorithm does not change the routing tables, but instead adds temporary entries 
to the forwarding tables unlike Wang’s original algorithm that uses tunneling on 
a per packet level. 
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Related Work The IEEE P1520 [1] is an ongoing effort to standardize the inter- 
face between the managed element and the control software. The current drafts 
define three levels of interfaces: COM between the hardware and the control 
software (drivers), L (low level) between the control level software and the low 
level implementation software (IP routing, RSVP), and U (upper level) between 
the low level software and the high level services. 

Since the IEEE PI520 is not standardized, yet, other standards are being 
used. Due to the wide spread of SNMP it is clearly a current attractive candidate 
for a standard agent-node interface. However, with a few exceptions [8,20] SNMP 
is missing from most past system design. Recently, there where several suggestion 
for agent systems that incorporate an SNMP interface [21,11]. 

Organization In the next section we present the new ABLE-I--I- architecture. 
Section 3 discusses the system performance, and Section 4 discusses security 
issues. In Section 5 we demonstrate the advantages of our interface design with 
an application example. 



2 Architecture 

Network management applications require an infrastructure to query and set 
state on managed devices. Current management models support both operations 
but in a very inefficient and platform dependent manner. We have addressed 
both inefficiencies by expanding the ABLE active engine architecture to include 
three processing Brokers (see Figure 1): a Session Broker, an Info Broker and a 
Control Broker. 

“Brokers” are used to complete a task on behalf of a requester. We have spe- 
cialized the concept of a “Broker” to handle three different active processing and 
network management tasks. The Session Broker creates, manages, and cleans up 
the processing environment as well as manages active session. The Info Broker 
provides an efficient monitoring channel by exporting local state to active ses- 
sions and mediating all queries for local state. The Control Broker provides a 
secure channel for control operations by exporting a set of control methods and 
associated policies. 

Process management, information queries, and control requests are each han- 
dled by different Brokers because each requires a different set of security re- 
strictions and/or performance metrics. The Session Broker must have complete 
monitoring and control access over all active sessions to prevent excess resource 
usage and facilitate communication between active sessions on different nodes. 
The Info Broker’s main function is to provide efficient access to local data. Since 
accessing data through this channel is read-only, the Info Broker does not in- 
troduce the same security concerns as the write-enabled control channel. Hence, 
the design focus in the Info Broker is on efficient access to data. On the other 
hand, the Control Broker does introduce many security concerns because active 
sessions can change the state of the device. The design focus here is on prevent- 
ing active sessions from leaving the network in an inconsistent state. All three 
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Brokers communicate with active sessions using TCP or UDP sockets. Therefore, 
ABLEH — h can support active applications written in any language that supports 
the socket interface. The following sections will discuss in more detail the design 
goals and architecture of each Broker. 

2.1 The Session Broker 

The Session Broker manages the initiation, behavior, communication and termi- 
nation of all active sessions. Conceptually it is a meta-Broker as it can be viewed 
as giving session control services to the system. The communication aspect of 
the Broker is the only non-meta service it performs for the session, and thus 
might call for a separate entity. However, the separation introduces inefficiencies 
in handling out-going messages and thus was left for further research. 

Most of the functionality of the Session Broker was inherited from the orig- 
inal design of the ABLE active engine [14]. Therefore, we will only go into a 
brief discussion. Both ABLE and ABLE-b- 1- architectures are designed to sup- 
port long lived, mobile active sessions. Network management applications must 
be long lived to implement any ongoing monitoring services. Mobility is essen- 
tial to propagate a distributed application throughout the network. ABLE-I--I- 
allows active sessions to control code distribution. Therefore, the Session Bro- 
ker’s responsibilities can be divided into twoaparts: managing active sessions 
(the meta-Broker) and exporting mechanisms for communication and mobility. 

Managing active sessions The Session Broker initializes, monitors and termi- 
nates active sessions. During the initialization phase, local resources (CPU, mem- 
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ory or bandwidth) are allocated to each session. “Watch dogs” or aging timers 
monitor all activities to prevent excess resource consumption or prevent session 
termination without freeing its assigned resources. All active sessions are reg- 
istered with our Security Module. The Session Broker can terminate an active 
session if it uses too many resources or attempts to violate security restrictions. 
We will discuss security in ABLE-I--I- in section 4. 

Communication and mobility The Session Broker controls active session commu- 
nication and mobility through an exported API (See Table 1). DistributeO and 



Table 1. Communication and Mobility Interface 



Name 

V 


\ 

Description 

J 


/ 

int Distribute (): 


\ 

Sends program and data to neighbors except original 
sender. Returns the number of sucessful messages. 


int DistributeAll (): 


Sends program and data to all neighbors. 
Returns number of sucessful messages. 


int Distribute (Addr): 


Sends program and data to Addr. 
Returns number of sucessful messages 


byte [] Receive (): 


Receives a packet. 


void Send ( byte [], 

DestAddr ): 


Send packet to DestAddr. 


void SendReport (String, 
DestAddr, Port ): 

\ 


Send String to DestAddr and Port. 

J 



DistributeAll () are used to propagate an application to every active node in 
the network. Session Broker will prevent multiple copies of the same application 
on a single node. Distribute(Addr) is used to send an application to a specific 
spot in the network. This function is especially useful for monitoring only specific 
regions on the network for bottleneck or congestion detection. send(byte [] ) 
and byte [] receiveO are standard communication functions used to pass 
messages between active sessions both on the same node or different nodes. We 
have also included a specialized report sending function, sendReport (String , 
DestAddr, DestPort) . Event or alarm reporting is a crux of network man- 
agement. Often, reports are the only ongoing documentation of the network’s 
behavior. Therefore, we have added a specialized function for reporting. 



The Active Process Interaction with Its Environment 



119 



2.2 The Info Broker 

The Info Broker is used to export local device information to active sessions. 
Specifically, the Info Broker specializes in retrieving information or state from 
local devices for active sessions. Without the Info Broker, active session program- 
mers must worry about how to get local information and what kind of informa- 
tion is available. The three components of the Info Broker, Engine, Interface, 
and Cache (See Figure 2), abstract these details away from the application pro- 
grammer all while providing an infrastructure for efficient information retrieval. 
In the following text, we will discuss each component in more detail. 




Engine The Engine is the “intelligence” of the Info Broker. Active sessions send 
requests for local state to the Engine. The Engine then determines the “best” 
avenue to handle the request, retrieves the information, and finally translates it 
back to the active session. 

The most interesting part of the Engine is how it selects the “best” avenue. 
Data is classified by its volatility. In the simplest case one can identify two 
types of data items: volatile, such as the number of active TCP connections, 
or non-volatile, such as the router name. However, volatility is a continuous 
spectrum where each datum can be associated with the length of time it can 
be cached before its value is expected to change significantly. To simplify the 
implementation, the Engine identifies a requested datum as part of three classes: 
volatile, quasi- volatile, or static, static and quasi-volatile data are cached in the 
local Cache. For the quasi-volatile data, we attach a time-to-live counter, that 
invalidates after a predetermined time period. Quasi-volatile data that expires 
can be either removed from the cache or be refreshed. 
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The advantage of placing the policies in the Engine is that it abstracts the 
instrumentation details away from the programmer thus reducing the complex- 
ity of applications. For example if an application needs local state from three 
different managed devices and each device requires a different communication 
protocol, the programmer must implement each protocol to manage those de- 
vices thus resulting in a large, complicated program. Not only does the Engine 
abstract those details from the application layer, it also reduces response latency 
by choosing an appropriate avenue for retrieving information. Currently, we have 
two modules, SNMP and Control Line Interface (CLI), implemented. We also 
understand there are some instances in which the application would like control 
over how the information is retrieved. Thus, we have added small instrumenta- 
tion hooks like SNMP’s get (oid) and getnext (oid) to force the Engine to use 
a particular instrumentation. This brings us to an important question. How do 
active sessions communicate queries to the Engine? 

Interface The Engine exports local state to active sessions via a predefined in- 
terface (see Table 2 for a list of methods). As mentioned previously, the main 
design goal of the Info Broker is efficient access to local device state. Current 
information retrieval models allow applications to ask for small “atomic” pieces 
of local state. However, applications monitor and make decisions at a much 
higher level. For example, a load balancing application needs to calculate load 
on managed interfaces. In the current practice, interface “load” is not exported 
at the device level. It must be derived from atomic pieces of local state such as 
MIB-H’s ifOutOctets and if Speed over a given period of time. The application 
must query for several variables and compute the load before continuing with the 
algorithm. This is an inefficient use of system resources and time. The applica- 
tion must wait until all variables have been received before proceeding. Waiting 
through possibly several round-trip transmit times prevents the application from 
working at fine grained time scale. We have addressed this problem by exporting 
high level data. Applications can now ask “Who are my neighbors?” with one 
query, “How many interfaces are alive?” with two queries, and finally “What is 
the load on this interface?” with one query, just to name a few. We have pushed 
the data correlation into the device level thus reducing the number of queries 
generated at the application level as well as the total round-trip latency. 

Cache We have added a small local cache to decrease query response time at the 
application level and reduce the query load from the managed device. Our cache 
reflects the design of the Interface and Engine policies. As mentioned before, we 
only cache popular static and quasi- volatile local state. 

The access time to the router local information is inherently much larger 
than accessing the local cache (see section 3). This is because routers are not 
optimized to answer queries and treat them with low priority. On the other hand, 
accessing the local cache does not require crossing of the user/kernel boundary 
and is much faster. When multiple sessions require the same local data, using 
the cache reduces the number of queries the router receives. 
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Table 2. Info Broker Interface methods 



Name 



Description 



/ 

int getNumlf (): 


\ 

Number of interfaces 


String getName (): 


Local machine name 


String [] getIpAddrs (Interface): 


List of IP addresses for Interface 


int getlfNumber (IPAddr): 


Interface number for IPAddr 


String getNextHopAddr (DestAddr): 


Next Hop IP address towards DestAddr 


String getNextHopIf (DestAddr): 


Interface number of Next Hop 
towards DestAddr 


float getLoad (Interface): 


Load of interface 


int getStatus (Interface): 


Operation status of Interface 


String [] getNeighbors (): 


List of IP addresses for all 
"alive " neighbors except loopback 


String [] getNeighbors (Interface): 


List of Neighbor IP address for Interface 


Boolean isLocalLoopback (IPAddr): 


True if IPAddr is loopback 


Boolean isLocalLoopback (Interface): 


True if Interface is loopback 


String [] getDestAddrs (Interface): 

\ 


Destination IP addresses for Interface 

J 



Caching state introduces a trade-off between retrieval efficiency and infor- 
mation freshness. A high update frequency benefits from storing fresh state, but 
at an increased retrieval cost. The cache policies must be tuned to achieve an 
appropriate trade-off between resource usage and stale state. Different caching 
policies can be set for each type of information. For instance, volatile data should 
be refreshed frequently, whereas static data will not change as often. 

Another related design issue is what triggers data updates. One option is 
to update the data periodically at some rate (that may depend on the data 
change rate [5]). Another option is to update staled data only if a new request 
arrives. Periodic updates may result in unnecessary overhead but maintain fresh 
information at all times, while the former approach may have longer reaction 
time if an application is accessing stale data. 
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2.3 Control Broker 

The Control Broker is a bidirectional channel that enables active sessions to 
control the behavior of the router. This allows authorized sessions to perform 
network engineering tasks such as changing an IP route, or modifying the QoS 
parameters in the router. Giving such an extended power to an active session 
comes with an heavy cost; unauthorized, or malicious sessions can ’’take over” 
the router and destroy the entire network. Thus, one should be very careful with 
the use of this feature. Only authorized sessions should be allow to use it, and 
the possible side effects should be minimized. Even when sessions do use this 
privilege correctly, there is a problem of coordinating the overall local (and also 
global) effect of the combined control action. Consider two NM applications: 
one is designed to monitor the status of all interfaces, and once a down interface 
is detected it tries to reset it; the other application is a security application 
that turns off an interface when an attack is detected. The results of these two 
applications working together in the same node might be that the interface is 
turned on and off alternately. 

The overall structure of the Control Broker is very similar to that of the 
Info Broker: it has an Interface that is a collection of classes used by the active 
session to request the service, and an Engine which processes requests. The 
Engine receives a request, checks whether it is legal by both authenticating the 
sender session and verifying no conflicts with previous, active requests exist, 
and then executes it using the best available underlying method or indicates 
an error has been found. The Security Module checks the authorization of a 
session to take certain control actions as well as detect conflicts. The Control 
Broker can communicate with the router to preform control tasks via SNMP, the 
router’s CLI, or a special-purpose control protocol. Note that the requirements 
of many control functions include the retrieval of information, so the control 
action functions, in many cases, return values to the active session. However, 
the focus and thus the performance considerations in the design of the Control 
Broker and the Info Broker are substantially different. 

A basic, but extreme example of a control function we have implemented 
is cliControl ( String command). This function takes as an argument a CLI 
command, executes it on the router, and returns the outcome string. This is a 
very basic, low level device-dependent function and the application (the active 
session) has to know (by checking possibly through the Info-Broker) what type 
of a router is located at the node, and then uses a router specific CLI to perform 
the control task. It is also very dangerous, as it allows the session to perform 
harmful operations such as the unix shutdown command. 

A higher level example is the function tmpSetRoute(destination, 
gateway , routeTTL) . This function creates a manual entry in the local router’s 
forwarding table for destination with gateway as the next hop, for a period 
of routeTTL milliseconds. The function returns immediately and does not wait 
until the new route has been cancelled. This requires that the control Broker 
engine maintain state for each route deflection request that is in progress. It 
also has to resolve and raise alerts for possible conflicts. This design ensures the 
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atomicity of the Control Broker service which has an important safety aspect. 
Since the Broker is expected to be more stable than any session, atomicity en- 
sures that any temporary change a session makes in the router state will indeed 
be cleared in due time. However, if the Control Broker fails, the router may be 
left in an unstable state. We discuss this further in Section 4. 

3 Performance 

In this section, we demonstrate the efficiency of the Info Broker by analyzing the 
response time between an active session and a managed router’s SNMP agent. 
We are using the following three functions, each differing in complexity (See 
Table 2 for the entire list): 

getNeighbors() walks the SNMP routing table (ipRouteTable) looking for di- 
rectly linked IP addresses (ipRouteType) . The function returns a list of 
neighbors for all local interfaces. 

getLoad(interface) polls the router for number of received octets 
(ifOutOctets), system time (sysUpTime), and link speed (ifSpeed) 
over a period of time. The load is computed by calculating the rate of 
received octets divided by the link speed and returned. We ignore the 
waiting interval between measurements. 
getNumIf() simply polls the router for the number of interfaces (ifNumber) 
and then returns this number. 

Our test program performed 200 consecutive function calls for each above 
three functions and for each of the following 3 monitoring channels (total of 9 
trials): 

Direct SNMP opens a connection directly with the SNMP agent, polls for 
needed state using only SNMP get and getnext functions, then computes 
the result within the active session. 

Broker SNMP (No Cache) opens a connection with the SNMP agent 
through the Info Broker, polls for needed state using only the Info Bro- 
ker get and getnext functions, then computes the result within the active 
session. (The only difference between this channel and Direct SNMP is an 
additional inter-process communication (IPC) between the session and the 
Info Broker.) 

Broker SNMP (Cache) opens a connection with the Info Broker and polls 
the local Cache for computed result. 

Figure 3 displays the median response times for the different monitoring 
channels and functions. Our implementation of the Info Broker as well as active 
sessions run on JVM 1.1.8, 200 MHz Pentium PC’s with 64 MB RAM running 
on FreeBSD 3.2 operating system. Response times were measured using Java’s 
system clock method (System. currentTimeMillis()). Over 200 consecutive trials, 
we notice periodic increases in response times. We suspect the increase is due 
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getNeighbors getLoad getNumlf 



Fig. 3. Response Time Between Active Session and Router 



to Java garbage collection and application thread scheduling algorithms in the 
JVM. Therefore, we have plotted the median of the 200 values to better represent 
our data. For getNeighbors () using Direct SNMP and Broker SNMP (no Cache), 
the median value is within 7.14% of 25% Quartile value and 12.4% of 75% 
Quartile value. In the rest of the experiments, the median, 25% Quartile and 
75% Quartile values differed by only 1 ms. The reason for the higher difference 
in the first two experiments may be attributed to the longevity of the algorithm. 

In all three functions, using the cache decreased response latency. In the most 
complex function, getNeighbors (), the cache improved performance by approx- 
imately 98% over both Direct SNMP and Broker SNMP with no Cache. This 
demonstrates that caching the right data can be extremely effective in reducing 
response latency, especially for complex functions. 

Another important point to note in Figure 3 is the difference between Direct 
SNMP and Broker SNMP (No Cache) response times. Both monitoring channels 
are the same except for the IPC cost between the active session and the Info 
Broker. Thus the difference of response times is the overhead of using the Info 
Broker. The Info Broker overhead is insignificant as shown by getNumIf() method 
which simply retrieves one atomic piece of state. (Our measurement tools could 
not measure time at a finer granularity than 1 ms) . 



4 Security 

As mentioned before, security and safety are crucial components of ABLE-I--I-. 
The separation between the Active Engine (AE) and the router plays a signifi- 
cant role in asserting safety. However, when an active session has the power to 
manipulate the router’s state, and to divert non-active streams, separation is 
not sufficient. Therefore, we have built a multi-level mechanism to ensure the 
secure and safe operation of ABLE-I--I-. 

The main idea is that sessions (at the thread or process level) will be isolated, 
much as the sandbox concept of JAVA applets. However, in our case, sessions 
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may need to have access to resources such as the file system, communication 
to the outside world, or access to the router’s state. Thus, we have to tailor 
the right envelope around each session, in a way that will allow it to perform 
restricted actions in a controlled way, and deny the use of unauthorized services. 
We call this flexible envelope the rubber box. 

As explained in Section 2.1, all the communication to or from the session is 
done through the Session broker. This serves several purposes simultaneously: 
First, at each given time there is only a single copy of each session, additional 
code belonging to the same session will be sent to the session and will not 
create a new copy. It also prevents a session from interfering with other sessions’ 
communication, or manipulating other sessions’ code. 

When a session is created, a list of authorized services is built in the session 
control block, and the required authentication is checked, using the ANEP header 
options [2]. The granularity of the authorized service list can be very fine, e.g., 
describing each of the Control Broker functions, or coarse with a few pre-defined 
security permission levels. 

When the session’s JAVA code is first run, we execute it inside a Class Loader, 
modified to work in the appropriate level. For example, if the session is unau- 
thorized to access the file system, it will be run inside a loader that blocks all 
classes that access the file system. This adds a second level of safety around the 
session. 

When services (such as Control Broker or Info Broker) are requested, the 
appropriate broker checks the authorized services list and acts accordingly. 

In addition, we use an IP level packet filter for the outgoing packets. A packet 
that originated from one of the active sessions, and was not sent through the 
Session Broker, is blocked. This prevents a session from trying to communicate 
outside the execution environment without proper authorization. 

The problem of deadlock prevention, and safe execution of the code of each 
session, is of course a very hard problem. We do not intend to address it here. 
Safe languages [9] and techniques like Proof Carrying Code [10] can be used to 
address some of these problems. 

5 An Application Example — Congestion Avoidance 

In current Internet, congestion is treated by end-to-end flow control. The network 
signals to the end-points the existence of congestion. As a result, the end-points 
reduce the rate at which they inject new packets into the network. Usually 
congestion is signaled via packet drop, which is a wasteful process because the 
network already consumed resources in getting the packet to the point in the 
network where they were dropped. Recent suggestions to use explicit congestion 
notification (ECN) [4] allow signaling about congestion before packets must be 
discarded, enabling a more efficient use of the network bandwidth. In both cases, 
the network reaction time depends heavily on the algorithm at the end-points, 
namely TCP, and on the round trip delay of the various connections. 



126 



Jessica Kornblum et al. 



These end-to-end flow control mechanisms have proved to be stable and effi- 
cient; however, due to the long control loop they do not prevent transient con- 
gestion in which packets are dropped. We suggest a local congestion avoidance 
algorithm to augment current mechanisms. The main idea behind the algorithm 
is to find a temporary deflection route for a portion of tlie traffic passing througli 
a congested link. The algorithm thus eases the load at this link and reduces the 
amount of packet loss until the end-to-end flow control decreases the flow rates. 
It is important to note that the algorithm does not interfere with the routing 
algorithm and can work in conjunction with any existing routing algorithm. 
The deflection is done by adding entries to the forwarding tables, temporarily 
overriding the route chosen by the routing algorithm. 



5.1 General Description 

After initiation, each node locally monitors the load on each of its outgoing 
interfaces. When an interface is identified as congested, e.g., by comparing the 
interface current transmission rate to the maximum transmission rate, the node 
tries to find deflection routes. The first step is to identify a destination (d) of some 
flow that passes through the congested interface. Then the node (denoted by c) 
sends to all its neighbors (not connected through the congested interface) the 
message CONGESTED(c, c, d, 0). The first field identifies the congested node id, 
the second field is the message sender id, the third field is the chosen destination, 
and the last field is a counter denoting the hop distance of the sender from the 
congested node. A node, n, that receives a CONGESTED (c, c, d, 0) can be either 
upstream from the sender, i.e., it forwards packets to destination d through 
node c, or it can be off-stream, meaning it forwards its packet to d via some 
other route. In the latter, node n sends node c the message ARF(n, d) indicating 
that an Aalternative Route was Found. In the first case, node n propagates the 
search by sending message C0NGESTED(c, n, d, 1) to all its neighbors except c. 

A node, n', that receives the message C0NGESTED(c, n, d, 1) sends ARF(n', d) 
to node n if the next hop to destination d is neither node c nor n. Otherwise, it 
ignores the message. 

A node that receives the message ARF(n, d) adds a temporary static entry 
to its forwarding table indicating that the next hop to destination d is node n. 
This static route is automatically removed after a pre-configured time interval. 
The algorithm’s pseudo-code appears in figures 4 and 5. 

The algorithm design details and its global performance are beyond the scope 
of this paper. We concentrate here, on the services required from the active node 
environment to efficiently facilitate the algorithm’s execution. 

5.2 Implementation Discussion 

In this section we discuss services the application receives from ABLE-I--I-. We be- 
lieve these services are typical for many applications running above any NodeOS. 

The simplest group of services contains functions that require atomic data 
such as the local host name. This is omitted for brevity from the code of Figures 4 



The Active Process Interaction with Its Environment 



127 



1. foreach interface i 

2. if loadii) > threshhold then 

3. d <— finddestii) 

4. foreach interface j ^ i 

5. send CONGESTED(c, c, d, 0) to all neighbor{j) 



Fig. 4. The pseudo-code for the load detection algorithm at node c. 



1. For CONGESTED(c, s, d, cnt) 

2. if nexthop{d) = c OR nexthop{d) = s then // upstream 

3. if cnt = 0 then 

4. foreach j G neighborsQ 

5. if j 7 ^ c then 

6. send CONGESTED(c, n, d, 1) to j 

7. else // offstream 

8. send ARF(n, d) to s 

9. For ARF(m, d) 

10. settemproute{d, m) 



Fig. 5. The algorithm pseudo-code for node n. 



and 5. The application can learn its local machine name for the sender id (line 5 
in Figure 4, and lines 6 and 8 in Figure 5) by calling Info Broker getNameO 
function. Other functions of the group are defined to get the router OS, or IP 
statistics. The main advantage of these simple services is the abstraction of the 
interface type, the simplification of obtaining the data (one single call), and, in 
case of static data such as hostname, the ability to use caching. 

Next in complexity are services that require a modest amount of queries. An 
example in the code of Figure 4 is load{i) which checks the load on interface i. 
In our implementation, this function has an optional parameter that define the 
measurement granularity, by specifying the time difference between two queries 
that check how many bytes have been send (ifOutOctet). Together with the 
interface speed, one can calculate the load. 

The most complex services are the ones involving neighbor lists. MIB-II [7] 
does not hold this list explicitly, and one needs to search for the neighbors in the 
tables that were designed for other purposes, e.g., in the IP routing tables. The 
task is further complicated by small variants in the implementation of these ta- 
bles among manufacturers. As result, obtaining the neighbor list (neighborsQ) 
seems cumbersome and hard for an application developer. Other services we 
supply in this problem domain are the list of all neighbors attached to a spe- 
cific interface {neighbor {j))^ and the nexthop neighbor on the route to some 
destination {nexthop{d)) . 
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In addition, we supply a service which is somewhat tailored to this specific 
application: finddest{i) returns some destination to which packet are routed 
through interface i. Our strong belief in local algorithms to solve global function 
suggest that other such applications can benefit from this service. 

In order to react to problems by taking corrective actions, an application 
(with the right permission) must be able to control the router. In this exam- 
ple, the function settemproute{d, m) adds a temporary static route to node d 
through the neighbor m. This service is more than abstracting the CLI (com- 
mand line interface) from the programmer. A session can specify the duration of 
the temporary route leaving the service responsible for deleting the static route 
after the specified time period has expired. In addition, the service validates that 
no conflicts between requests exist, and is responsible for eventually cleaning up 
the temporary changes even if the application was terminated incorrectly. 

In general, control services are more complex than monitoring since (as we 
see in the example above) one needs to make sure that an application error will 
not have a long term negative effect on the router operation. Specifically, one 
needs to check for state coherency at all times. Thus, control services need to be 
at higher level of abstraction. To illustrate this, suppose an application wants 
to change the route to some destination, d. One way to do this is to supply 
the application with two services: deleteroute{d) and setroute{d). In this case, 
if due to an error the application terminates after deleting the route and before 
adding the new one packets to d will be routed through the default route which 
might cause routing loops. A better design of the interface is to supply a higher- 
level abstraction, say a service f orceroute{d) that performs the delete and set 
operation at the Broker Interface level. 
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Abstract. In this paper a multicast application is presented, based on 
active networking techniques, in which processing of the multicasted data 
is done in intermediate network nodes, according to requests or demands 
done by users joining the multicast session. The problem of transcoding 
in multicast sessions is divided into two subproblems. First some issues 
about the application of active networking techniques are briefly dis- 
cussed. Then, the problem concerning the optimisation of the location of 
transcoding nodes in the multicast tree will be the main subject of this 
paper. The exact solution and a number of heuristics are discussed and 
simulations of the solutions will illustrate the advantages of the applica- 
tion of transcoding in multicast sessions. 

Keywords: Active Networks, Multicast Tree, Transcoding, Tree Op- 
timisation Problem 



1 Introduction 

The revolutionary technique of active networking [1,2, 3, 4] brings a whole range 
of new possibilities to the field of packet based networking. Present networks, in 
which the only functionality is based on storing and forwarding of packets in the 
nodes, will be replaced with an active network, enabling extra functionality in 
the network. In active nodes, the traditional concept of Store-and-Forward will 
be replaced by the new concept of Store-Compute-and-Forward, enabling some 
extra processing to be done on the packets and their content, before they are 
forwarded. In this new concept, packets will not only contain data, but will also 
be able to transport pieces of code that can be executed while passing through 
an active node. In this way, active packets containing active code will be able to 
travel through the network, executing the code in the active nodes. The code can 
be carried out-band, in packets exclusively transporting active code, or in-band 
together with an amount of data in one packet. Furthermore, it is possible to 
load extra code into an active node, providing extra functionality in the network 
in a dynamic way. These new techniques and possibilities of performing extra 
processing on datastreams passing through active nodes, injecting code in the 
network and configuring the network dynamically by loading extra functionality 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 130-144, 2000. 
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into the nodes, give rise to extensions of existing applications and new appli- 
cations finally becoming feasible. Whereas present applications in networks are 
exclusively based on functionality provided in the end-points of communication 
and the network is only serving for the forwarding of packets, it is possible with 
active networks to base the application on functionality provided both in the 
network and in the end-points. A category of applications that will surely bene- 




Fig. 1. Transcoding in an audio multicast session 



fit from the new abilities provided by active networks are multicast applications. 
In this paper a multicast application is presented, based on active networking 
techniques, in which processing of the multicasted data is done in intermediate 
nodes, according to requests or demands done by users joining the multicast 
session. In traditional multicast applications, all users receive the same version 
of the data when joined to the multicast session. This implicates that all users 
receive the data in the same encoding and at the same rate, irrespective of the 
capabilities of the terminals or machines used by the users. The users have no 
choice about how and in what form they receive the data. It would however be 
a welcome extension to provide the users with a number of versions of the data 
with different Titrates to choose from. In this way, a certain customisation of 
the data streams is possible and a user can choose an appropriate data rate, 
according to the actual capabilities of his access or up-link to the network. For 
instance, an audio stream can be multicasted using a number of codecs, resulting 
in a number of available Titrates. Users can for example choose from the 64kbit/s 
A-law codec [5], being the data at full rate, the 40kbit/s ADPCM codec [6] or 
the 13kbit/s GSM-FR codec [7]. Then a user connected via an ISDN line will 
probably opt for the 64 kbit/s codec, a user connected via a 56K6 voice modem 
may choose the 40 kbit/s codec, whereas the user on the 33K modem will have 
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to go for the 13kbit/s codec. Providing this kind of service in present networks is 
only possible by setting up a multicast session for each of the different versions 
of the data. A user then has to join the appropriate session to receive the desired 
version. On the contrary, in active networks it is possible to provide this service 
using only one multicast session and using transcoding of the data in the active 
nodes. One multicast session is set up and the full-rate version of the data is in- 
jected in the multicast tree. Then, in intermediate nodes of the tree a conversion 
of the incoming data stream to a lower bitrate may be done, according to the 
demands of the users serviced downstream by this node. The active solution for 
the example described above will need the set up of a multicast session servicing 
the three users. In the multicast tree the full-rate data (64kbit/s) will be injected 
and in total two conversions of the data to a lower bitrate (40 and 13kbit/s) will 
need to be done in certain intermediate nodes of the tree. The resulting data 
streams and necessary transcoding nodes in the tree of a possible solution are 
shown in figure 1. The problem of data transcoding in multicast sessions can 
be divided into two sub-problems. A hrst problem concerns the application of 
active networking techniques to realise the transcoding of data in intermediate 
active nodes. The second problem concerns the optimisation of the location of 
these transcoding nodes in the multicast tree. The hrst problem will be briehy 
discussed in paragraph 2, pinpointing the most important problems and points of 
attention. A look will be taken at some architectural issues, concerning the nec- 
essary active networking capabilities and the possibilities for data transcoding. 
The second problem will then be discussed in detail in the remaining paragraphs 
3 and 4 of this paper. In paragraph 2, in addition to the active networking is- 
sues, also the advantages of this concept in contrast with traditional solutions 
will be shown and a number of relevant applications based on the concept will 
be suggested. 

2 Transcoding in Multicast Sessions 

2.1 Transcoding in Active Networks 

To be able to do processing in intermediate nodes in the network, the existence 
of active nodes in the network is needed. Apart from these active nodes, the 
transcoding of data will only be possible in the source node of the multicast tree. 
With this assumption it is not necessary that the whole network becomes active 
all at once. It is even possible to introduce an application based on transcoding 
in a network with no active nodes present yet. In this case there will only be 
transcoding in the source and for each version of the data a multicast tree will 
be used. Then, when a gradual introduction of active capabilities in the network 
takes place, transcoding can be done in the active nodes and the source will 
gradually be relieved from transcoding. Before any transcoding can take place 
in an active node, the necessary code will have to be in place. This code can be 
put in place during configuration and kept in the node forever. It can however 
be more favourable and efficient to load the necessary code dynamically from a 
code-base when the functionality is needed and to discard the code when it has 
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expired, thus freeing the node resources for other applications. This technique 
implies the use of the active node concept, in which the functionality to process 
the data is located in the active nodes and active packets only carry identifiers 
or references to the functions to be executed on the content of these packets. 

Processing of data passing through an active node requires a certain amount 
of resources from the node. The resources from the active node will have to be 
shared by the normal routing and forwarding activities at one side and the active 
processes on the other side. It is obvious that enough resources will have to be 
reserved for the normal activities of the node, in order to enable the best-effort 
forwarding service to remain. By consequence, only resources not needed by the 
routing and forwarding process are available for active processes. The number 
of active processes able to run without disturbing the normal functionality will 
thus be limited and it will be necessary to study the available resources and 
processing power. In this way, an estimation of the available resources in an 
average node can be made and a prediction can be done of the number of active 
processes being able to run in parallel. 




Fig. 2. Frame dropping in MPEG video stream 



With the limitation of the processing power in mind, the question rises 
whether transcoding in active nodes is really feasible. Coding and decoding of 
a voice or video stream requires a considerable amount of processing power and 
especially when working in real-time, the timing constraints become very rigid, 
making it clear that for the moment only conversions of data with rather low 
demands on processing power are feasible. Conversions between two audio or 
video codecs, requiring the decoding of the incoming signal to the original sam- 
pled data and the coding of this data into the new codec, are not yet feasible. 
Some other conversion on the contrary, while not requiring that much processing 
power, can already be very useful. First there are some embedded codecs, such 
as the G.727 embedded ADPCM voice codec [8]. This codec generates a num- 
ber of bits per sample, allowing to drop some of the least significant bits from 
each sample, resulting in a lower bitrate stream of lower quality. The conversion 
only involves dropping of some bits and does not need any heavy processing. 
Similar to this, layered media compression [9] gives a cumulative set of layers. 
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where info is combined across layers to produce progressive refinement. Again, 
by dropping some of the information layers, the bitrate is reduced together with 
the quality, while preserving the basic information or data. The same is possi- 
ble for the MPEG video codec [10] where the information is sent in I-, P- and 
B-frames. In a first stage the P- and B-frames can be dropped and later even 
some of the I- frames can be ignored to further reduce the bitrate [11]. This is 
illustrated in hgure 2. These conversions, while offering a very simple bitrate 
reduction mechanism, need some specific coding and decoding schemes to en- 
able this possibility. This is not the case when using simple frame dropping in a 
JPEG video stream. Every frame is compressed using JPEG and reducing the 
frame rate is only achieved by dropping certain frames entirely. 

2.2 Advantages 

Now, a number of advantages brought by transcoding in multicast sessions, in 
comparison with traditional solutions, will be discussed. First of all, this applica- 
tion offers extra functionality to the users of multicast sessions, enabling them to 
do a customisation of the data they want to receive, according to the capabilities 
of their system and access network. It provides more flexibility in the application 
towards the users, giving them the ability to choose how they want to receive 
the data and in what format. A second advantage is particularly interesting be- 
cause it concerns some of the weak spots of the present Internet. Recently it has 
become obvious that in the current best-effort Internet some of the main bottle- 
necks are situated in the user’s access network at one side and at the application 
providing servers on the other side. It is clear that the capabilities of the user’s 
up-link to the Internet limit the maximal speed or bandwidth available for the 
applications. So the local access of the user restricts the possibilities and thus 
the capability to have for example a satisfactory real-time communication will 
depend on the available bandwidth of the up-link. Another impediment for a 
fast working service in the present network is the fact that popular web- or ap- 
plication servers have to handle a large amount of connections, often resulting in 
servers being overloaded. This overload leads to connections being rather slow 
and other incoming connections being denied service. Transcoding in multicast 
sessions can be a cure for both of these problems. Whereas the currently avail- 
able solution would be to set up different multicast trees for each of the data 
versions, in the active solution only one multicast tree needs to be set up, in 
which only one version of the data has to be injected. In this way a reduction of 
the server load is achieved. Furthermore, using network-based transcoding the 
user is not restricted to one version of the data anymore. By enabling to choose 
the appropriate data format, the influence of the access network is reduced and 
the user can opt for a customised version that suits best the abilities of his local 
access. Another advantage of the transcoding concept is an optimisation of the 
overall network load. By transmitting the data over a multicast tree, already a 
reduction of network load is achieved and the use of transcoding gives in turn 
an additional reduction of the required bandwidth. To conclude, the transcoding 
concept brings extra ffexibility in the application and offers optimisation of the 
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server load, the load on the access network and the general bandwidth usage in 
the network. 



2.3 Application 

As mentioned above, a number of audio and video codecs are very well suited 
to provide some easy transcoding abilities, making audio and video streaming 
some of the most obvious applications to benefit from the concept of transcoding 
in multicast sessions. Furthermore, these applications are very attractive for the 
general public, providing a very interesting and appealing illustration of the 
capabilities of active networks. Other applications can for example be found 
in services distributing a large amount of data in the network, on a regular 
basis. When the multicasted data has rather a heterogeneous character (i.e. it 
contains different types of information), not every user will be interested in all 
the data at once. Consequently, in intermediate nodes, the data streams can be 
stripped of all redundant information (for the downstream users), resulting in a 
lower bandwidth being required. An example could be a real-time stock quotes 
distribution service, sending every minute the new values of all quotes in the tree. 
Then, based on the users requirements or preferences, quotes are passed through 
or are dropped. In this way, users receive only the data they are interested in 
and also the rate at which new data are passed through can be adjusted, for 
example delivering the data every minute to one user and every 15 minutes to 
another user. A news distribution service can be functioning in the same way. 



3 Transcoding Tree Optimisation 

When using transcoding in a multicast session, some of the active nodes in the 
network have to do the necessary data conversions. These nodes will be called 
transcoding nodes. There have to be enough nodes of this type, in order to make 
sure that all conversions are done and that every user receives the requested 
version of the data. The location of the transcoding nodes in the multicast tree is 
now subject to some optimisation, according to for example the total bandwidth 
usage in the tree and the number of conversions in the nodes. First a general 
description of the optimisation problem is given and then a number of solutions 
are described: the exact solution and two heuristics. 

3.1 Optimisation Problem 

First some general assumptions, concerning the optimising problem, are dis- 
cussed. The network contains a number of nodes, interconnected by a number of 
edges. Initially it is assumed that all the nodes are active, so every node is capa- 
ble of performing transcoding tasks. Every link is characterised by its bandwidth 
capacity and by a cost, related to the use of the available bandwidth. The nodes 
in turn are characterised by a capacity, related to the available resources and 
the number of data conversions that can be done. Consequently, a non-active 
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node, not able to do any transcoding, will have a zero capacity. Additionally a 
cost function for the node is related to the use of the available processing power 
for transcoding. In this network, a multicast session needs to be set up from a 
certain source node to a number of destination nodes. The data can be delivered 
in a limited number of versions or types and for every destination, a specification 
of the requested data type is given. Every version of the data is characterised by 
its bitrate or bandwidth and a conversion between two versions is characterised 
by the amount of necessary processing power and resources. A data transcoding 
is only possible between two types, when it concerns a conversion from a higher 
bitrate to an equal or a lower bitrate. A number of destinations asking for the 
same version or type of the data are grouped in a class. The classes are ordered, 
according to the required bandwidth for that data type. Class 1 destinations 
request the data type with the highest amount of bandwidth required. Conse- 
quently, transcodings are only possible from a lower class to a higher one (e.g. 
from class 1 to class 3). Now, given every destination’s class, a multicast tree 
needs to be set up from the source to the destinations, containing a number of 
transcoding nodes, taking care of the conversions of the data into the requested 
versions. The total cost of the tree will be determined by a cost function, de- 
pending on the costs of links and (transcoding) nodes in the tree. This cost can 
be optimised by adjusting the location of the transcoding nodes, reducing the 
number of transcoding nodes and the total bandwidth usage. Returning to the 




Fig. 3. Transcoding Tree optimising 



example given above, the optimisation process can be illustrated as follows (see 
figure 3). In a first stage, just the Steiner minimal tree is used to connect the 
source and the three destinations. While only using 3 links, the total bandwidth 
used amounts 192kbps (3*64kbps). In the second case the total bandwidth is 
reduced to 181kbps by using another multicast tree, containing one more link. 
However, with this tree the two transcodings will have to be done in the same 
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node. By restricting the number of transcodings in a node this can be avoided 
and a better distribution of the transcoding nodes over the multicast tree can 
be achieved. So, in the third case, again a bandwidth of 181kbps is consumed 
but the transcodings are done in separate nodes. In the following paragraphs 
the solution of this problem will be described. First the exact solution based on 
Integer Linear Programming (ILP) is discussed. Because the problem is closely 
related to the Steiner Tree problem, it is obvious that this problem is also NP 
complete and the exact solution will not be useful when dealing with large net- 
works. Therefore a number of solutions have been derived, based on heuristics, 
allowing the calculation of the solution in a faster way, but possibly resulting in 
a non-optimal solution. Further on, the implementations of these solutions, both 
exact and heuristic, will be used to perform a number of simulations, calculating 
the optimal multicast tree and the transcoding nodes. 

3.2 Exact Solution and Heuristics 

First of all, a look is taken at the problem in the situation where no active 
nodes are present in the network. A first solution can then be the delivery of 
the different data types using point-to-point connections. This results in a large 
number of users connected to the server and in an inefficient use of bandwidth in 
the network. Because each type of data is requested by a number of different users 
at the same time, a better solution is achieved by using a multicast tree for each 
type to deliver the data. This way, server load and bandwidth usage are partly 
optimised. This second non-active solution will be used to compare with the 
active solutions, illustrating the possible optimisation when using transcoding 
in multicast sessions. 

To find the exact solution for the problem described above, it was formu- 
lated as an Integer Linear Programming (ILP) problem. The details of this ILP 
mathematical formulation of the problem will not be discussed here. It is based 
on an ILP formulation of the Steiner Tree problem and a number of extra con- 
straints and variables have been added. The objective function that is to be 
minimised, is a cost function based on the contributions of the links and nodes 
of the tree. Because of the NP complete character of the problem, as already 
mentioned before, a number of heuristics were determined. Now two heuristics 
for the problem are presented: a first called the Skinned Steiner Tree Heuristic 
and a second called the Branch Attach Heuristic. Both the heursitics are based 
on the calculation of one or more Steiner trees [12]. For the calculation of these 
trees, in turn a heuristic [13] is used, because of the NP-complete character of 
the Steiner tree problem. 



Skinned Steiner Tree Heuristic In this heuristic, it is attempted to calculate 
the Steiner tree containing the source and all destinations (of all classes), with 
a reserved bandwidth equalling the highest bandwidth requested. This way, it 
is assured that every edge in the tree has enough capacity to allow flows of any 
type. Due to this bandwidth demand, requiring the reservation of the highest 
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bandwidth requested, it is possible that some of the destination nodes are not 
yet attached to the tree. These nodes are then attached as well. First the edges 
with less capacity than the bandwidth needed for that specific destination are 
discarded. Then the cost of the edges already in the tree are set to zero and 
the shortest path from this node to the source is calculated. Edges belonging to 
this shortest path are added to the tree. At this stage, a tree connecting all of 
the destination nodes and the source is found, but no bandwidth optimisation 
has been done. This can be achieved by ’’skinning” the tree in order to reserve 
only the minimal amount of bandwidth needed. This is done by walking through 
the tree from the leaves to the source. In the leaves the required bandwidth on 
the incoming edges is known. When the bandwidth on the edges to the leaves is 
reserved, the algorithm works his way up the tree, determining the bandwidth 
needed on the other edges and the transcoding that needs to be done in the nodes. 
Figure 4 illustrates the skinning step. In figure 4a a simple tree from a source 




Fig. 4. Skinning the bandwidth 



to two destination nodes is shown. Node 1 is a class 1 destination, requesting 
type 1 of the data, and node 2 is a class 2 destination. Initially, on all of the 
edges enough bandwidth is reserved to accommodate the required data types, so 
the edges can certainly handle the requested bandwidth. The algorithm starts 
walking through the tree until it encounters a leaf. In the leaf, no transcoding 
needs to be done, only the requested data version has to be received. So the 
amount of bandwidth to reserve on the incoming edge is known. So, in figure 
4b, in the marked node a data stream of type 1 is received (transcoding from 
type 1 to I) and the incoming edge needs enough bandwidth to carry the type 
1 flow. In figure 4c the same is done in the next leaf: a type 2 data stream needs 
to be received (transcoding from type 2 to 2) and enough bandwidth is reserved 
to carry a type 2 flow on the incoming edge. Since all outgoing edges of the node 
marked in figure 4d have been handled, the transcodings needed in this node can 
be determined, as well as the bandwidth needed on the incoming edge. It is clear 
that enough bandwidth has to be reserved for a type 1 flow on the incoming edge 
(the largest bandwidth flow of both). Therefore transcodings from type I to 1 
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and from type 1 to 2 are needed in that node. In figure 4e finally the source is 
reached. In all of the edges that are part of the tree enough bandwidth has been 
reserved and the required transcodings in the nodes have been determined. 



Branch Attach Heuristic This heuristic is based on the calculation of a 
Steiner Tree for each class of destinations. In the algorithm, edges used in trees 
with higher bandwidth requirements are favoured to be used in the lower band- 
width trees as well. The resulting trees for every class are merged to one final 
multicast tree. First the Steiner Tree is calculated for the destinations of the 
lowest class (highest bandwidth requirement). The edges of this tree are added 
to the final data transcoding tree and in this tree bandwidth is reserved for the 
current type of data. Then the cost of these edges is set to zero in the original 
network, making it favourable to reuse these edges in the trees for the other 
types. When not all destination classes have been added to the final tree, the 
following procedure is followed. The Steiner Tree is calculated for the destina- 
tion nodes of the next class in order (lower bitrate). The edges of this Steiner 
Tree are added to the final tree and enough bandwidth is reserved. If the new 
found edge is already part of the final tree, a transcoding may be necessary in 
the target node of the edge. The needed transcodings are determined, based on 
the incoming and outgoing flow. In a last step, the cost of the new edges is set 
to zero in the network and this procedure is repeated for each destination class 
in turn, in decreasing order of bandwidth requirements. Consequently this al- 
gorithm calculates not one Steiner Tree, as in the first heuristic, but calculates 
different smaller ones that are combined to one tree. 



4 Simulations and Results 

With the non-active solution, the exact active solution and the two heuristics, a 
number of simulations have been performed, in order to illustrate the advantages 
brought by transcoding and to test the performance of the different solutions. 
A network containing 36 nodes and 57 bi-directional links is used. The edge 
cost per unit of bandwidth is set to 1. The node cost (for transcoding) is also 
set to 1. The capacity of the edges and the nodes is not restricted, enabling to 
carry all necessary traffic on the edges and to do all necessary transcodings in the 
nodes. This gives a rather simplified model, which in a next stage will be refined, 
allowing to account for the restricted transcoding resonrces and the various real 
costs of the codecs and their transcoding. The multicast data can be delivered 
in a number of different versions. Every data class requires a specific number 
of bandwidth units. Two series of simulations have been performed. In a first 
series the amount of destination classes in the tree is gradnally increased, keeping 
the number of destinations constant. In a second series the amount of classes 
was kept constant while gradually increasing the amount of destinations in the 
tree. In the first series, random sets containing 1 source node and 10 destination 
nodes are used. In these sets the amount of classes is gradually increased from 
1 to 10, spread over the 10 destinations. The 10 classes require the following 
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bandwidth: class 1 needs 10 units, class 2 needs 9 units,..., and finally class 
10 needs 1 unit. For every situation, i.e. a certain number of classes spread 
over the 10 destinations, several random sets are generated, enabling to do an 
averaging of the results. In the second series, the number of classes spread over 
the destinations is kept constant to 5 and only classes 1 to 5 are available. Here, 
class 1 requires 5 units of bandwidth, class 2 needs 4 units,..., and finally class 
5 needs 1 unit. Now, the sets contain 1 source node and an increasing number 
of destination nodes. Again an averaging of the results is done over a number of 
random sets for each situation. The simulation results are discussed below. 
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Fig. 5. Tree cost (fixed number of destinations) 



For the first series, in figure 5 the tree-cost for the active and non-active case 
is shown. Since the edge-cost is set to 1 per unit of bandwidth, the tree-cost is also 
the total amount of bandwidth needed in the tree. In the active situation, the cost 
is less than 60% of the non-active cost, when 4 or more classes are used. Although 
less bandwidth is used, more transcodings are required due to the increasing 
number of classes. As is shown in figure 6, in the non-active case N-1 transcodings 
are needed, where N is the number of classes. These N-1 transcodings are all done 
by the source-node. In the active case more transcodings have to be done, but 
they are distributed in the network and only a few have to be done in the source 
node. Furthermore, the bandwidth used by the total data flow leaving the source 
node is shown in figure 7. For the active case the average flow coming from the 
source is about 17 units. In the non-active case the flow amounts at least the 
values set out by the Min-curve, i.e. the sum of the bandwidths needed by the 
available classes. From these results, coming from simulations with the exact 
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Fig. 8. Difference in tree cost between heuristics and exact solution 




Fig. 9. Tree cost (fixed number of classes) 
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solution, it is already clear that using transcoding in a multicast session gives an 
optimisation of the total bandwidth used and the load on the server or source 
node. 

Some calculations of the exact solution took more than 2 hours to complete 
on a Pentiumlll 450MHz. Therefore simulations were also performed using the 
two heuristics described above. In figure 8 the results of the heuristics and the 
exact solution are compared and the difference in the total tree cost between 
them is shown. The heuristics give a good approximation of the exact tree: the 
cost of the tree calculated by the heuristics differs only up to 3.5% from the 
optimal solution, while calculation times remain below 300 milliseconds. 

Now, for the second series, again the tree cost and node cost are shown. 
It is seen in figure 9 that the bandwidth needed in the active case is again 
about 60% of the bandwidth needed in the non-active case. The node cost and 
consequently the number of transcoding operations increases linearly with the 
increasing number of destinations, as shown in figure 10. 
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Fig. 10. Node cost (fixed number of destinations) 



5 Conclusions and Further Work 

In this paper, the concept of data transcoding in multicast sessions was de- 
scribed. Some issues on the use of active networks were briefly discussed. Then, 
a number of the main advantages were presented, together with some new appli- 
cations benefiting from the new concept. A more detailed discussion was given 
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on the multicast tree optimisation problem, for which an exact solution and two 
heuristic solutions were proposed. To conclude, the results from a number of sim- 
ulations were presented, illustrating the advantages mentioned before. In future 
work, a more refined model will be used, enabling to look at realistic situations 
where nodes have restricted resources for transcoding and with realistic costs for 
transcoding and bandwidth use. Then, the implementation of this concept will 
be tackled. The transcoding problem will be handled, together with the problem 
of setting up a multicast tree, able to do these transcodings. An implementation 
will then give the opportunity to further explore some major performance and 
optimisation issues. 
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Abstract. This paper aims to provide insight into the behavior of con- 
gestion control mechanisms for reliable multicast protocols. A multicast 
congestion control based on active networks has been proposed and 
simulated using ns-2 over a network topology obtained using the Tiers 
tool. The congestion control mechanism has been simulated under dif- 
ferent network conditions and with different settings of its configuration 
parameters. The objective is to analyze its performance and the impact 
of the different configuration parameters on its behavior. The simula- 
tion results show that the performance of the protocol is good in terms 
of delay and bandwidth utilization. The compatibility of the protocol 
with TCP flows has not been demonstrated, but the simulations per- 
formed show that by altering the parameter settings, the proportion of 
total bandwidth taken up by the two types of flow, multicast and TCP, 
may be modified. 



1 Introduction 

Congestion control for heterogeneous traffic mixes is one of the most challenging 
problems of the Internet. End-to-end congestion control mechanisms that combine 
fairness, resilience, high network utilization, low transit delay and that support of a 
mix of traffic are being investigated. The problem is particularly difficult for multicast 
applications, although in the reliable multicast case the elastic nature of the traffic 
leads to fewer restrictions than in the real-time multicast application case. 

Congestion control for reliable multicast protocols is an active area of research 
since the IETF [I] stated that any such protocol must incorporate a congestion control 
mechanism compatible with current Internet approaches (i.e. TCP). However, cur- 
rently established proposals for reliable multicast protocols lack congestion control 
because initial work on the field was mainly focused on solving the scalability prob- 
lem. Most recently-published proposals that incorporate congestion control are thus in 
a rather immature state. 
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Multicast applications come in all shapes and sizes. Their different functional and 
performance requirements usually imply differences in congestion control needs, in 
particular as regards internal group fairness, traffic elasticity and minimum acceptable 
throughput. A first classification of multicast congestion control algorithms divides 
them into two groups: receiver-driven or sender-driven. Current proposals for re- 
ceiver-driven algorithms are mainly based on using layered communication. Receiv- 
ers are responsible for controlling congestion by measuring the lost rate and discon- 
necting from group(s) if it goes up a certain threshold. This approach is simple and 
effective, but can only be applied to applications that do not require full reliability 
(e.g. quality adaptive) or to bulk transfer combined with FEC to achieve reliability. A 
further disadvantage of this approach is that it is not TCP-friendly, which can be par- 
ticularly problematic when the delay to join/drop from a group is high. 

Proposals for sender-driven protocols are mainly oriented towards bulk-data trans- 
fer or fully-reliable interactive applications. In the latter type of applications, the av- 
erage group rate will be that of the slowest receiver admitted to the group. Congestion 
control is the joint responsibility of sender and receivers. Each receiver estimates the 
"proper" rate and communicates it to the sender, which adjusts its rate to the lowest 
one indicated by the receivers. An advantage of this approach is that it appears to 
make a TCP -friendly mechanism more tractable. An example of this line of research 
is the usage of a TCP rate-estimation function [2] at the receiver end. To apply this 
function, measurements of the loss rate and estimations of the RTT are required. Two 
of the still unsolved challenges in this area are to provide an appropriate response 
time, while using measurements averaged over time, and the design of a feedback 
control mechanism that avoids implosion caused by rate notifications from receivers. 

The approach studied in this article is the congestion control mechanism designed 
for the Reliable Multicast Active Network Protocol (RMANP) [3]. This protocol is a 
sender-based fully-reliable multicast protocol in which the reliability and congestion 
control can benefit from processing performed at some of the routers. These routers 
use active network technology [4], therefore allowing the implementation of an active 
multicast congestion control mechanism. Active routers participate in congestion 
detection, congestion recovery and congestion notification. The advantages are obvi- 
ous: within the network it is possible to know where, when, and how the transmission 
rate needs to be adapted to the particular congestion state of each network area. 

The complexity of multicast protocols, and their associated congestion control, and 
the interactions between different traffic flows in the network make it very difficult to 
predict the system behavior in an analytical way. The cost of actual implementation 
and experimentation makes simulation a valuable intermediate solution. We have 
therefore simulated the functionality and performance of the RMANP congestion 
control in order to obtain data for its evaluation and design refinement. In the follow- 
ing sections we first give an overview of the congestion control of RMANP and then 
describe the simulation topology obtained with the Tiers tool, together with the simu- 
lation scenario and methodology used. This description will allow the reader to evalu- 
ate the validity both of the simulation results obtained, and of the conclusions that 
have been drawn from them. 
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2 Active Multicast Congestion Control Overview 

We briefly describe the multicast congestion control used in RMANP. A more de- 
tailed description of the congestion control may be found in [5], The active conges- 
tion control requires the participation of the source, receivers and active routers 
(ARs). The source performs rate control, beginning the session with the minimum 
acceptable rate set by the application (parameter Rm). It adaptively adjusts the rate, 
decreasing it in case of congestion or increasing it when acknowledgements are re- 
ceived. A relevant difference with TCP is that in our case the control is rate-based 
instead of window-based. 

The multicast session will involve many receivers with different bandwidth and 
delay characteristics in the paths to them from the source so that the group must 
evolve at the throughput of the slowest receiver. However, as the application declares 
a minimum acceptable rate, receivers that cannot reach Rm are excluded from the 
session. This section describes how congestion is detected and notified to other sys- 
tems, how the systems react to congestion, and how they recover when the congestion 
is relieved. 



2.1 Congestion Detection 

The ARs and the receivers perform sequence number control. If the loss of a packet is 
detected, the system concludes that there is congestion in the upstream subnetwork. In 
the example shown in Figure 1, congestion at subnetwork 3 will be first detected at 
AR4. As packets are immediately forwarded downstream, the packet that is used to 
detect congestion is marked to avoid congestion being inferred again at the next 
downstream AR. In addition to this mechanism, an AR detects congestion when the 
buffer it uses to store excess packets (new_queue) fills up and overflows. Lost re- 
transmitted packets cannot be detected by intermediate sequence control, and for this 
reason the receiver controls the number of successive retransmission requests, signal- 
ling congestion if they exceed a certain threshold. 

Other multicast congestion control proposals only perform detection at end- 
systems. The source detects congestion by time-out of a positive feedback mechanism 
(usually ACKs) and receivers detect congestion by looking for lost packets in the data 
stream. The advantage of having the ARs perform detection is that the system can 
react faster to congestion, leading to lower packet loss which, in turn, implies lower 
average delay and higher network utilization. 

Because congestion is detected in RMANP by intermediate sequence number con- 
trol, the downstream AR will detect congestion occurring at the upstream subnetwork. 
Other approaches, such as the one proposed by Faber for unicast communications in 
[6] can only detect losses that occur at ARs. This has the disadvantage that congestion 
in non-active subnetworks cannot be detected. 
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2.2 Congestion Notification 

An AR or receiver that has detected congestion sends an explicit congestion indica- 
tion (Cl packet) towards the source. It first computes the proper rate (adjusted rate) of 
the session and places it in the Cl sent. To compute the adjusted rate it multiplies the 
rate at which the packet was sent (included in control information in data packets) by 
an absolute multiplicative decrement (parameter Da). 

The Cl packet will be processed by all upstream ARs. In this way not only the 
source but also the ARs react to the congestion. To be able to handle severe conges- 
tion, involving loss of congestion indications, the source implements a fall-back con- 
gestion detection mechanism based on the use of a time-out for the reception of ac- 
knowledgements. 




^ Conventional Router 

Fig. 1. Active multicast congestion control example 

It is important to remark that using explicit notifications instead of implicit ones, 
such as the expiration of retransmission timers or the reception of duplicated ACKs, 
avoids the systems (ARs and source) reacting to non-existent congestion situations. 



ARi 



Active Router i 




Subnetwork i 



2.3 Reaction to Congestion 

The AR that has detected congestion and sent the notification towards the source, 
does not react further to this congestion because it is upstream from this AR and the 
systems responsible for reducing the rate of transmission into the congested area are 
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therefore the upstream ARs. The congestion control mechanism is hop-by-hop, as 
corrective actions will be taken at every AR upstream from the congestion detection 
point, and finally also at the source. The basic mechanism is a gradual rate control 
distributed between the congestion point and the source. The basic idea is that each 
AR controls the rate at its output interfaces setting it to the value contained in the Cl 
with the lowest value received. Sinee after applying rate control, packets are reeeived 
at a faster rate than that of the output, excess packets are buffered in a queue in the 
AR ealled “new_queue”. The adjusted output rate will not be the same at all upstream 
ARs, but will be gradually reduced at each one. The rate will be decreased at each AR 
by a relative decrement factor (parameter Dr): each AR multiplies the rate received in 
the incoming Cl by Dr in order to calculate the one placed in the Cl that it will, in 
turn, send upstream. The objective of this mechanism is to distribute the effort of 
buffering the overflow of traffic that is in transit, until the moment at which the 
source finally receives the congestion notification and reduces its rate. With this solu- 
tion the network can react rapidly to transitory congestion episodes, reducing the 
number of lost packets and consequently improving delay and network utilization. 
The use of a gradual decrement allows the temporary traffic overflow to be distributed 
among all the ARs involved, rather than only being buffered by the one immediately 
upstream from the congested point. Its use also allows each AR to gradually empty its 
new_queue by ensuring its output rate is higher than that of its input. 

Figure 1 shows an example in which the source is initially transmitting at 8 Kb/s. 
Subnetwork 3 becomes congested and this fact is detected by AR4, that notifies it to 
AR2. The Cl sent includes the adjusted rate calculated by AR4. Assuming that pa- 
rameter Da is set to 0.8, the rate in the first Cl would be 6.4 Kb/s. Consequently, AR2 
would adjust its output rate to 6.4 Kb/s, and excess incoming packets (those whose 
transmission would case the output rate ceiling to be breached) would be buffered in 
its newjjueue. AR2 would also send upstream a Cl with a rate indication of 6.33 Kb/s 
(assuming that Dr is set to 0.99). This behavior would be repeated at every AR until 
the last Cl reaches the source. 

Other proposals [6,7] install filters that perform rate control and discard incoming 
packets that overflow the output rate. Our proposal is to absorb the overflow in the 
ARs, avoiding the retransmission of overflow packets from the source at the cost of 
some memory consumption at the ARs. Since in RMANP the ARs perform retrans- 
missions of packets (local loss recovery), the rate control must also be applied to 
retransmitted packets, although these are sent with higher priority than new packets 
buffered at newjqueue. 

It is important to remark that due to the explicit inclusion of the adjusted rate in the 
CIs, multiple congestion indications, which may arise in several different ways, can 
be filtered. In the example shown in Figure 1, if congestion arose in subnetwork 2 it 
would be detected by AR2 and ARS, which would then send a Cl to their upstream 
AR. ARl would thus receive both CIs, but the one arriving second would be filtered. 
Hence, the AR only reacts once in the event of a single congestion incident being 
detected by several systems (e.g. a LAN), or being detected several times (e.g. differ- 
ent packets lost at different downstream ARs). It also means that the AR only reacts 
to the worst case in the event of several congestion incident occurring simultaneously 
in the distribution tree. An important consequence is that RMANP is loss-tolerant and 
does not react twice for losses occurring simultaneously at different places in the 
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distribution tree. This requirement of multicast congestion control has already been 
addressed by other researchers [8], 



2.4 Leaving the Congested State 

In order to readjust and stabilize in function of the newly-available network capacity, 
the transmission rate needs to be increased. The source and the ARs increase the rate 
each time an ACK is received by adding a fraction of Rm, multiplied by the number 
of packets acknowledged, to the current rate. The increased fraction of Rm used by 
the source is controlled by the parameter Is. The ARs use the parameter In in the same 
way. 

An AR considers that the congestion situation is over when it has emptied its 
new_queue. It will then desist from controlling the output rate and will just forward 
incoming packets as soon as they are received. The source will continue increasing 
the rate as long as the application has data to send and no CIs are received. 



3 Simulation Environment 

All simulations were made using ns-2 [9]. The simulator was extended to implement 
the RMANP and its congestion control mechanism. The network topology, shown in 
Figure 2, was generated using the Tiers Topology Generator. One WAN, two MANs 
and four LANs compose the internetwork. The receivers are connected to different 
LANs and the source is located in LANl. There are seven ARs placed at different 
points of the internetwork. 

The WAN links were set to 180 Kbps, 100 Mbps for MAN links and 10 Mbps in 
the case of LANs. The propagation delay of LAN links was considered null and the 
propagation delays of MAN and WAN links are shown in the figure 2. With these 
figures, the propagation delay from the source to the farthest receiver is 587 ms. 

Some of the RMANP parameters were fixed throughout all the simulation experi- 
ments performed. The settings used were: 

• Source packet size was set to 1057 octets (data plus headers), parameter Rm to 16 
Kb/s and parameter Is to 0.005. 

• Packet cache used for local loss recovery at Active Routers: 100 packets. 
New_queue cache: 10 packets. Conventional router interface queue: 3 packets with 
FIFO droptail. The Active Routers In parameter was set to 0.05. The value of the 
Dr parameter used was 0.99. 
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Fig. 2. Simulated topology 

These settings are considered representative of the type of applications that would 
use the RMANP service. Moreover, the impact of these settings is reasonably predict- 
able, and our goal was to simulate the impact of other parameters considered more 
critical. Several sets of simulations of each type were carried out and since the disper- 
sion in the average results was minimal, the protocol is considered stable at least un- 
der the simulated conditions. 



4 Simulation Results 

To evaluate the RMANP performance, we simulated a session, between a source and 
1 5 receivers, that coexists with a constant background traffic that suddenly increases 
and later decreases again. In this set of simulations, parameter Da was set to 0.8. Fig- 
ure 3 shows the evolution of the instantaneous transmit rate of the source (in Bytes/s) 
in one of the simulations. The simulated time is 950 seconds. The background traffic 
is injected between routers NO and N3. Its rate is 14.5 KByte/s between 0 and 350 
seconds, it is then increased to 18.5 KByte/s until t = 650 seconds, when it is reduced 
to 14.5 KB/s. The average bandwidth obtained by RMANP was 3.787 KB/s between 
350 and 650 seconds, and 6.467 KB/s over the rest of the simulation. Notice that 
RMANP makes use of 94.7% of available bandwidth between 350 and 650 seconds 
and only 80.8% the rest of the time. The reason for the lower figure in this last inter- 
val is the slow start at the beginning of the session. 
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Simulated time (seconds) 



Fig. 3 •Transmission rate of the source 

The packet transit delays experienced at different receivers are shown in Table 1. 
Notice that the average delay for Receiver R4 is 1% above the minimum one, in spite 
of no congestion being experienced on the path to it from the source. This slight in- 
crease is caused by the rate reduction which is self-imposed by RMANP at active 
routers N1 and N4. The average delay at receivers R8 and R12 undergoes an increase, 
of 8.5 % and 8.6%, with respect to the corresponding minimum values. This larger 
increase is caused by the bottleneck trunk line between routers NO and N3 that affects 
both R8 and R12. The effect of the bottleneck may be seen in Figure 4, where the 
delay experienced by each packet between its emission at the source and its reception 
at R12 (in seconds) is shown. Comparing Figures 3 and 4 it can be seen that the peaks 
in packet delay correspond to the instants at which the session rate exceeds the avail- 
able one. Congestion causes an increase of delay in two ways. First, rate reduction at 
active routers implies a queuing delay (at new_queue) until the upstream active router 
also reduees its rate. Second, congestion implies loss of packets, thus delaying deliv- 
ery to reeeivers. 

The average packet delay is redueed by the effect of intermediate buffering, which 
implies a queuing delay for a packet instead of a much higher retransmission delay. 
The price to pay is the buffering requirements at active routers, which have also been 
measured in the simulations. The average number of packets stored at the different 
ARs ranges from 4.97 to 9.79, with peaks ranging from 10 to 28 packets. Therefore, 
the absolute peak of memory required is 28 KByte for this session. Assuming that the 
memory requirement is linear with the number of flows, a router handling a T3 line 
that could handle 1002 such flows would imply a peak memory requirement of 27.4 
MByte. 
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Table 1. Packet Transit Delays 



Receiver 


Minimum 

Calculated 

(ms) 


Simulation Results 


Aver- 
age (ms) 


Low- 
est (ms) 


Highest 

(ms) 


R4 


3.36 


3.40 


3.36 


3.42 


R8 


777 


843 


111 


3339 


R12 


778 


845 


778 


3352 



From the different sets of simulations carried out, the parameter Da can be seen to 
affect three relevant performance figures: packet delay, end-to-end average through- 
put, and active router memory requirements. A second round of simulations has been 
carried out specifically to determine the effects of different values of Da, using values 
of 0.9, 0.7 and 0.6, in addition to the previous value of 0.8. The performance results of 
the four cases are compared in Table 2. Notice how a lower Da leads to an improve- 
ment on end-to-end delay and buffer requirements, but also to a decrease in through- 
put. A lower Da implies a greater rate reduction when congestion is detected, causing 
an increase in queuing delay at new_queue. The queuing delay introduced for the 
worst-case packet is: T*(Rout/Rin - 1), where T is the time during which the input 
rate is higher than the output rate (reaction time of previous active router), and the 
quotient between the output rate (Rout) and the input rate (Rin) is precisely \/Da. 
Flowever, a higher Da causes more packet loss because the instantaneous rate exceeds 
the available one more frequently. Notice that both effects have an inverse influence 
on delay, but the loss effect is more significant. 




Fig. 4. Packet Transit Delay in reeeiver R12 
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Table 2.Performance results with different setting of Da 



Da 


Average Delay 
in R12 (ms) 


Average end-to-end 
Throughput (KB/s) 


Average Queue 
size (packets) 


0.6 


823 


4978 


8.76 


0.7 


830 


5276 


9.14 


0.8 


845 


5612 


9.79 


0.9 


941 


5872 


12.17 



A lower Da decreases throughput since a higher rate cut means that the time re- 
quired to increase the rate up to the available one is also higher, and therefore the 
average rate is lower. The impact of Da on buffering is caused by a combined effect 
of throughput and loss. Higher throughput obviously requires higher buffering be- 
cause acknowledgment time is constant. Higher losses also imply higher average 
buffering because the period a lost packet is buffered is much higher for a packet 
which is lost than for one which is successfully transmitted on the first attempt. 

A third set of simulations were performed using a mix of an RMANP session with 
several TCP flows. Different parameters (line bandwidth, Da, ...) of RMANP were 
modified in the various sets of simulations. A bandwidth share between the TCP 
flows and the RMANP session was observed. The quotient between the throughput 
achieved by RMANP and each TCP flow ranged from 1.55 to 0.41. The difficulty in 
evaluating these results arises when trying to define what should be the “proper” share 
in the different cases. TCP throughput depends on round trip time and number of 
flows, and varies significantly among different flows that share the same end-points. 
RMANP throughput depends largely on the Rm (minimum rate) parameter, the worst 
case combination of leaf round trip time and branch link bandwidth, the value of Da, 
and the increase rates Is and In. Therefore, for a given topology and collection of TCP 
and RMANP flows, what should be each of them’s “proper” share? Some authors 
suggest that in order to be TCP-friendly, the bandwidth achieved by a multicast ses- 
sion should be equal to the lowest of the bandwidths that a TCP session would 
achieve between the source and any of receivers, but this approach is still not gener- 
ally accepted. A possible direction for future work is to study the definition of TCP 
compatibility and how to achieve it. 



5 Conclusions and Future Work 

The congestion control mechanism of RMANP has been shown to adapt the group 
throughput to that available. It has also been shown to achieve a high utilization of the 
available bandwidth, subject to a trade-off between transit delay and utilization, de- 
pending on the value of Da. Average delays obtained are better than those achievable 
by end-to-end protocols because of the local retransmissions from active routers along 
the distribution tree. This makes RMANP suitable for applications sensitive to delay 
(e.g. interactive traffic). 

Active network technology has several advantages in the implementation of multi- 
cast congestion control mechanisms. Local retransmission, intermediate sequence 
number control, local rate-control and buffering of overrate packets are all feasible 
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with active networks. In particular, these advantages can be achieved with only a 
partial deployment of active routers. One possibility would be to place them at the 
endpoints of expensive transmission trunks, such as intercontinental lines, where the 
cost of the required processing power would be worth the improvements in delay and 
line utilization. 

The TCP compatibility of the mechanism has not yet been established, but a 
promising result has been obtained: in the simulations performed the bandwidth was 
shared, in varying proportions, between RMANP and TCP. Further work will be 
aimed at providing a quantitative definition of TCP compatibility and trying to 
achieve it by appropriate setting of the protocol parameters. It is clear, however, that 
TCP compatibility must be demonstrated under any topology and combination of 
RMANP and TCP flows. Due to the complexity involved, any such demonstration 
will inevitably require a very considerable simulation effort, even under the assump- 
tion that the proper parameter combination is known in advance. 

Among the weak aspects of the congestion control mechanism proposed we high- 
light its rate -increase mechanism. Simulation results have shown that a better adapta- 
tion to changing available bandwidth could be obtained by using a low Da combined 
with an increase mechanism that takes into account the history of the rate in use. This 
could be implemented at the source without any need to increase the processing com- 
plexity at active routers. 
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Abstract. We present a framework for heterogeneous video multicast- 
ing, considering an active network in which active nodes can filter the 
video stream to satisfy different quality requests. As a part of this ap- 
proach, we propose a heuristic algorithm for the construction of a mul- 
ticast distribution tree that appropriately chooses the active nodes at 
which filtering is performed with the aim of, for example, minimizing the 
total required bandwidth. We evaluate the performance of our algorithm 
and compare it against two other approaches: simulcast and layered en- 
coded transmission. Through simulation experiments, we show that a 
larger number of simultaneous multicast sessions can be set up with ac- 
tive filtering. 

Keywords: heterogeneous multicast, active networking, video filtering 



1 Introduction 

Heterogeneous multicasting of video is a natural candidate for enjoying the ben- 
efits of active networking. At video filtering nodes, new streams of lower qualities 
can be derived from the received ones, and hence we become able to satisfy di- 
verse quality requirements. Alternatives to dealing with heterogeneous quality 
requests for video multicasting include simulcast and distribution of layered en- 
coded video [1], that have the advantage of being able to be used in the actual 
network infrastructure, but with excessive use of network resources. Active fil- 
tering seeks to reduce the use of the required bandwidth choosing the location 
of filtering nodes appropriately, with the compromise of requiring processing 
overhead at some nodes. 

Research into filtering by Yeadon et al. [2] and Pasquale et al. [3] predates ac- 
tive networking research, but propose a filtering propagation mechanism to vary 
the location where filtering occurs according to the requirements of downstream 
clients. AMnet [4] proposes a model and an implementation for providing hetero- 
geneous multicast services using active networking. According to this approach, 
a hierarchy of multicast groups is formed, in which some active nodes that act 
as receivers in a multicast group become roots in other multicast groups, but 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 157-170, 2000. 
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it is not explained how the multicast groups are conformed and how the root 
senders of each multicast group are elected. 

In this work we aim at two objectives. First, we give a framework for het- 
erogeneous video multicasting, considering a network in which active nodes can 
perform filtering of the video stream to generate lower quality ones to satisfy 
requests of downstream nodes. In our framework, we first collect all the ses- 
sion clients’ requests, and use this information to form a hierarchy of multicast 
groups, where the top level group root is the video server. The members of 
this group are the clients which requested the highest quality video, and one 
or some active nodes which hlter the video stream, producing one with lower 
quality. These active nodes become roots of other multicast groups to satisfy 
the requirements of other clients. Analogously, these new multicast groups can 
have one or some active nodes as members that become roots of even lower level 
groups. Second, we propose and evaluate an algorithm to efficiently elect the 
roots of the multicast groups. The effectiveness of active filtering depends on the 
topology of the video distribution tree, but to our knowledge no previous work 
has discussed this issue. 

The rest of this paper is organized as follows: Section 2 describes our frame- 
work for multicasting video using active node hltering; Section 3 gives the details 
of the algorithm for electing an appropriate multicast distribution tree; Section 4 
evaluates its performance, comparing it with other approaches for distributing 
video; Section 5 concludes our work. 



2 A Framework for Heterogeneous Video Multicasting 
Applications 

2.1 Assumptions about the Network 

We assume a network in which some of the nodes are active. A proposed frame- 
work for active networks [5] presents a structure for active nodes, which is divided 
into three major components: the Node Operating System (NodeOS), which al- 
locates the node resources such as link bandwidth, CPU cycles and storage; the 
Execution Environments (EEs), each one of which implements a virtual machine 
that interprets active packets that arrive at the node; and the Active Applica- 
tions (AAs), which program the virtual machine provided by an EE to provide 
an end-to-end service. End systems that host end-user applications are also con- 
sidered as active nodes having the same structure. This framework also defines 
an Active Network Encapsulation Protocol (ANEP) header, which must be in- 
cluded in each active packet to distinguish to which EE it must be sent to be 
processed. Legacy (non-active) traffic whose packets don’t include the ANEP 
header must also be supported by active nodes, and in this case they act as 
conventional nodes. 

Active nodes differ from conventional nodes in that they have memory and 
processing resources that can be used by end users to customize the network be- 
havior. We assume that AAs can leave state in the active nodes when necessary. 
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i.e., put values or identifiers that will be used by subsequent packets of the same 
application. 

Code that runs in an active node can be classified into trusted and untrusted. 
We call trusted code to the programs whose whose execution is previously known 
to be safe, i.e., will not harm the node. We can consider them as modules that 
enhance the node functionality, with few restrictions in the use of node resources. 
On the other hand, the execution of untrusted code must be enforced to remain 
between some limits (i.e., restricting the API it can use, limiting its size, the 
time and memory resources it can consume.) It is usually executed through an 
interpreter to enforce security checking, slowing down its execution. 

Some existing implementations make this distinction. Switch Ware [6] divides 
code between active packets and active extensions, active packets replace tra- 
ditional packets and contain data and code but with limited functionality, and 
they are used for inter-node communication, while active extensions can be dy- 
namically loaded to give nodes added functionality. ANTS [7] uses extensions 
to allow the existence of code with privileges or whose size is too large to be 
transferred using its capsule-based code distribution mechanism. 

Video filtering code, excepting the simplest filter types, is relatively large, 
resource consuming, and requires fast execution, and is not likely to fit into the 
limitations of untrusted code. Handling the filter programs as trusted modules, 
we can use the advantage that video streams have a relatively large number 
of packets that require the same type of processing, by preloading the filtering 
code to work over the entire stream. Handling the code as trusted can also let 
us optimize it for faster execution. 

We assume that the active network has mechanisms for loading trusted code 
into the nodes (e.g., trusted code servers from which it can be transferred.) We 
don’t discuss security issues in this paper. 

In contrast, code used for protocol signaling is used mainly for leaving soft- 
state in or getting information from active nodes, and can fulfill its purpose 
running within the restrictions of untrusted code. 

To construct the video distribution tree appropriately, the algorithm should 
have information on the network. Since active nodes have the functionality of 
conventional nodes and support non-active traffic, we assume that they commu- 
nicate with conventional nodes using standard network layer protocols such as 
OSPF [8] , in order to discover the network topology. Similarly, it could be neces- 
sary to consider a signaling protocol between active nodes in order to exchange 
information exclusive to them. Due to security concerns, the information that 
active nodes exchange using this must be limited. We assume that such kind 
of protocol already exists and it lets an active node to “discover” other active 
nodes in the network and query some basic information such as the EEs running 
on them. 

2.2 Sketch of the Application 

Our approach for heterogeneous video multicasting considers filtering at some 
properly located active nodes. The video server is required to produce and send 
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the video stream of the highest quality among all the clients’ requests. Then it 
is transcoded to a lower quality at one or more intermediate active nodes to, for 
example, fit to the available bandwidth of downstream links. Since the distribu- 
tion of the video streams with different qualities is done using current multicast 
protocols, when the video stream needs such transformation, the designated ac- 
tive node first subscribes to the corresponding multicast group to receive the 
stream to transform. Then it filters/transcodes the stream, and becomes root 
of a new multicast group of which the clients requesting the transformed video 
stream are members. We can re-filter an already filtered video stream in order 
to obtain another one with lower quality, and hence a hierarchy of multicast 
groups can be conceived. This idea was pioneered in AMnet [4]. Fig. 1 depicts 
this approach. 

Each multicast group in the hierarchy is constituted using network layer mul- 
ticast (i.e., IP multicast). Those groups are “glued” and ordered hierarchically 
using a protocol implemented for the active network. It can also be possible 
to have active multicast protocols that replace network layer multicast in some 
groups, but similarly the interaction between multicast groups is controlled by 
an upper layer active protocol. This is shown in Fig. 2. 

Yeadon et al. [2] presented some different approaches for filtering MPEG 
video streams. The simplest method for rate reduction is mere picture discard- 
ing, which consists on eliminating progressively B, P and I pictures. This ap- 
proach is of limited applicability, since it only allows the reduction of bandwidth 
modifying the frame rate. Beyond picture discarding, other approaches include 
partial decoding and re-encoding of the video streams. For example, low-pass 
filters involve discarding the high frequency DCT coefficients, and requantiza- 
tion filters increase the value of the quantization factor to increase the number 
of zero DCT coefficients. They implemented those filters for MPEG-1 video 
streams, and found that although end-to-end delay and jitter is increased, they 
are feasible for continuous media streams. Here we do not specify which filtering 
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approach to use, and assume that any one is usable. Nevertheless, it is necessary 
to consider that depending on the complexity, some filters could not be possible 
to implement or could introduce non-negligible delays at the active nodes. 

We now describe the components for an application that performs heteroge- 
neous video multicast employing filtering. We schematized it in Fig. 3. 

1. Session announcement. The server uses a well-known multicast address to 
inform the possible clients about the session contents. Information includes 
the available qualities and required amount of resources. The protocol used 
to send these messages can be similar to the SAP protocol [9] used in the 
MBONE. 

2. Session subscription. Each of the clients that wants to participate in the ses- 
sion sends a request to the server containing the desired quality parameters. 
The quality requested by the client reflects not only the user’s preference on 
the perceived video quality but also limitations on its available resources [10]. 
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In the algorithm in Section 3, we assume that quality corresponds to one QoS 
dimension for simplicity, but it is possible to consider more parameters, e.g., 
quantization scale and frame rate. 

3. Derivation of the distribution tree. After the requests are collected, the server 
defines the conformation of the multicast groups and the active nodes that 
are going to perform filtering considering the network condition, i.e., topol- 
ogy, resource availability and clients’ requests. The calculation algorithm is 
explained and evaluated in the following sections. 

4. Set up of filtering nodes. As explained before, we assume filtering code to be 
preloaded prior to sending the video stream. We require a signaling procedure 
to inform the designated nodes that they are roots of multicast groups and 
to load the required filtering program. It is possible that node set up fails due 
for example to insufficient resources, and in this case, we must go back to the 
previous step and choose a new node and therefore a different distribution 
tree. 

For node set up, the server must send the following information: 

- Multicast address as receiver: the active node receives the video data to 
be filtered as a member of this multicast group. 

— Multicast address as sender: the active node distributes the filtered video 
data using this multicast address. 

— Filtering parameters: the sender sends a reference to the required code, 
and the required parameters, e.g., the quantization scale in a requantiza- 
tion filter. As explained before, a designated filtering node must pre-load 
the filtering program before the start of the video transmission, set up 
fails if it is not able to do so. 

We assume the use of soft state, it means that after set up is done, it is nec- 
essary to send “refresh” messages periodically to maintain the node waiting 
for packets to be processed by the filtering code. If no refresh messages are 
sent, it is assumed that filtering is no longer needed and the node releases 
the reserved resources. 

5. Client subscription to the multicast group. We are assuming to use the exist- 
ing IP multicast protocols, such as IGMP for client-router communication, 
and DVMRP and MOSPF between routers [11]. IP multicast requires each 
client to join a multicast group specifying the group IP address. In our ap- 
proach, the sender informs each client of the IP addresses of the multicast 
groups which it should subscribe. On receiving the multicast group address, 
the client performs the corresponding subscription. 

6. Data transmission and feedback. The server multicasts the video stream of 
the highest quality to requesting clients and active nodes which filter it to 
get the lower quality streams. 

Although not discussed in detail in this work, the nature of best-effort net- 
works makes necessary to monitor the reception conditions of the clients, 
since usable bandwidth for the video session is not assured. Active nodes 
can be used to check this. The advantage of having a hierarchy of groups is 
that feedback implosion can be controlled. Each client sends feedback mes- 
sages only to the root of the multicast group to which it is subscribed. Results 
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Fig. 4. Example of a state tree 



are consolidated by the active node acting as root, which in turn sends a re- 
port containing its own reception condition information and/or consolidated 
information of its multicast group to the root of the parent multicast group. 
Parameters of interest for video streaming applications include packet loss, 
delay and jitter. We can use RTP packets [12] to send the data, and then 
infer those parameters. The utilization of this information, i.e., how to use 
it to dynamically modify the distribution tree, is left for future study. 



3 Algorithm for Construction of the Multicast 
Distribution Tree 

In this section we detail our approach for the construction of the multicast distri- 
bution tree previous to the start of the transmission. As described in Section 2, 
it consists of a hierarchical conformation of multicast groups, and the purpose 
of the algorithm is to adequately elect the root and members of each of them 
and to choose on which active nodes hltering must be done. 

For simplicity, we assume one QoS dimension, and therefore each client re- 
quest can be expressed by a scalar value, that denotes the requested quality. We 
also assume that the paths the multicast routing algorithm uses are the same as 
the unicast paths from the source to each one of the destinations, as created by 
Dijkstra’s algorithm, since this coincides with multicast routing algorithm used 
in dense mode subnetworks such as DVMRP and MOSPF. 

Our algorithm forms a distribution tree in a request by request basis, taking 
the requests in descending order of quality. In the case that there are many 
requests with the same quality, we take first the ones from the clients closer 
to the sender. We try to use the sender to stream to the clients that require 
the highest quality, and choose the nodes located in the best place to perform 
filtering. The designated active nodes becomes the root node of a new multicast 
group of a filtered video stream of lower quality. The filtered stream is then sent 
to clients that demanded lower quality streams. 
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Each step in the construction of the tree defines a state. The state is defined 
by a variable c that stands for the number of clients that have been already 
considered, and the characteristics of the distribution tree needed to serve those 
clients, that is, which nodes have been used to filter and produce, if any, the 
stream with the requested quality. Fig. 4 depicts a sample state tree. Each state 
is denoted as c — f, where c stands for the number of clients, i < Nc is the state 
index, and Nc is the number of states in that round. At the hrst round, there is 
only one state 1-1, where only one client with the highest demand is satisfied by 
being provided the video stream at the required quality directly from the server. 
From a state in round c, it is possible to derive several states for round c -f 1, 
depending on how the stream that the new client demands has been generated. 

When deriving states from a round in the state tree, we define a set of “can- 
didate senders” to provide the requested stream to the client newly considered 
in the next round. Either the original server of the video sequence or any of the 
active nodes in the network can be the candidate sender. For a given flow request 
and candidate sender, one of the following situations is possible: 

1. The candidate sender is already requested to relay a stream with the desired 
quality by a previously processed client. In this case the client subscribes to 
the multicast group the stream belongs to. 

2. The candidate sender is already requested to relay a stream with a quality 
higher than the one requested. In this case, this stream must be filtered 
at this candidate sender. Then, a new multicast group is created with the 
candidate sender as the root, and the requesting client becomes a member 
of this multicast group. 

3. The candidate sender is not relaying a flow. In this case, the candidate sender 
must first subscribe to a multicast group, filter the stream that receives as a 
member of this group, and become the root of a new multicast group. The 
requesting client subscribes to this new group to get the stream. 

The election of the filtering nodes is based: 

1. On the distance, i.e., number of hops, between the client and the candidate 
node. The first candidate to choose is the closest one to the client that 
already belongs to the distribution tree, i.e., that relays or filters a flow to 
satisfy requests of previous rounds. The next ones are chosen close to this 
one. 

2. On a function / that considers other factors such as total bandwidth used, 
link utilization, and/or the use of node resources. This function can be 
thought as a measure of how good is the complete distribution tree being 
formed. A lower value of / means a better distribution tree. 

For simplicity, we assume only one variable that comprises the node resources, 
and that a hltering operation reduces the value of this variable by a predefined 
amount. If one active node has already exhausted its resources, filtering cannot 
be performed, and it is not considered as a candidate sender. 
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As described above, our algorithm belongs to the category of exhaustive 
search algorithms. It means that the number of possible states in each round 
directly affects the efficiency of our algorithm. In the worst case, the number of 
candidate senders is equal to the number of active nodes in the network, say A, 
plus the original server. In such a case, the number of states JVc in round c 
becomes (A + 1)'^“^. Since this is computationally expensive if the number of 
requests or active nodes in the network is not small, two parameters were defined 
to restrict the number of states JVc to analyze: 

— We limit the number of candidate senders to expand in each round to a 
fraction b of the total candidate senders. 

— We restrict the number of new states generated in a round to a maximum 
of TO. 

In each round, we select up to a maximum of to states to expand, the states 
chosen are the ones with the lowest values of /. Each state is expanded with 
b X (A + 1) new states, in which each new state implies a different candidate 
sender elected to satisfy the request of the next client. The election of these new 
states is done by the distance in number of hops criterion explained above. In 
this paper, we have not analyzed the effect of the values of b and to, and we 
chose them empirically for our evaluation experiments. We continue expanding 
the state tree until all the clients’ requests are satisfied. Then, the state with the 
lowest / is chosen. 



3.1 Example 

Fig. 5 shows an example network topology with 10 nodes. Active nodes are 
marked with squares and non-active ones with circles. Client requests are indi- 
cated with unfilled circles with a number that represents the requested quality. 
The server is attached to node 3. When the sender is attached to an active node, 
we must distinguish if the filtering is performed at the active node, or if the 
stream is provided by the sender. 

The qualities are related with the bandwidth according to the data in Table I, 
taken from a previous work from our research group [13] for the MPEG-2 video 
coding algorithm [14]. In layered video case, the layers must be piled up to 
achieve higher quality video. For example, the bandwidth required for a stream 
of quality 4 is given as 5.19 (layer 1) -|- 3.56 (layer 2) -|- 4.89 (layer 3) -I- 9.01 
(layer 4) = 22.65 Mb/s. The different qualities are obtained varying the quantizer 
scale, and active nodes derive the video stream of lower quality by de-quantizing 
and re-quantizing the received stream. 

Fig. 5 shows the multicast groups conformed by our algorithm. Arrows show 
the required streams, and arrow tips point to multicast group members. Two 
filtering processes are needed in node 4 and one in node 9. It must be noted 
that active node 4 becomes member of multicast group 1, just to provide filtered 
streams to clients in nodes 1 and 6. 
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Table 1. Required bandwidth for streaming video (Mb/s) 



quality 

(quantizer scale) 


single-layer 

video 


layered 

video 


4(10) 


14.4 


22.65 


3 (20) 


8.8 


13.64 


2 (30) 


6.6 


8.75 


1 (40) 


5.4 


5.19 



4 Evaluation 

In this section, we show the effectiveness of our proposed algorithm through 
some numerical experiments. We generate random topologies using Waxman’s 
algorithm [15], and choose the parameters appropriately to generate topologies 
with an average degree of 3.5, to try to imitate the characteristics of real net- 
works [16]. We assumed the proportion of active nodes in the network to be 0.5. 
For simplicity, each filtering operation is assumed to use the same amount of 
resources. We also assumed that the number of filtering operations that each 
active node can afford is a random value between 15 and 30. The location of ac- 
tive nodes is chosen at random. The location of the server, the clients and their 
corresponding requests’ qualities are also generated randomly, and vary from 
one experiment to the other. Clients can request the video stream in one of four 
available video qualities, according to Table 1. We apply two other approaches 
for multicast tree construction to the same topologies for comparison purposes. 
Those are simulcast and distribution of layered coded video. 




Fig. 5. Multicast groups for our algorithm 
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Fig. 6. Average bandwidth (Mb/s) for the first ten sessions 



The definition of /, which is used to evaluate the effectiveness of the built 
tree in the algorithm can be modified according to which network parameters 
are most important in the construction of the distribution tree. We performed 
the evaluation using two simple definitions, those are for minimizing bandwidth 
and minimizing link utilization, which we’ll call fi and f 2 - 

leu 

f YyieU rr,\ 

= -TiT 

where i denotes a used link, hi is the set of used links, and Bi denotes the used 
bandwidth in link i. With fi we wanted to minimize the total bandwidth used 
per session, and with /2 we expected that our algorithm could perform some 
sort of “load balancing,” to avoid congesting a single link. 

We compare our algorithm increasing the number of sessions in the network 
to see how many sessions can be simultaneously set up and provided for users. 
In the experiments, all the links are assumed to have a bandwidth of 100 Mb/s. 
We multiplex sessions, each of which is set up according to our algorithm, until 
the bandwidth of any link is exhausted. Here, we should note that the network 
we consider is best-effort and the constraint on the available link bandwidth is 
not taken into account in our algorithm stated in Section 3. Thus, the number 
we consider here is that of simultaneously acceptable sessions without causing a 
seriously overloaded link. The sessions are independent, and we do not use the 
information of the links used by the other sessions to build the current tree. 
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Fig. 7. Maximum number of simultaneous sessions (out of 15) 



In Figs. 6 and 7, experiments 1-10, 11-20, 21-30, 31-40, 41-50, 51-60 refer 
to 20-nodes 10-requests, 20-nodes 20-requests, 20-nodes 50-requests, 50-nodes 
10-requests, 50-nodes 20-requests, and 50-nodes 50-requests cases, respectively. 

Fig. 6 shows the average bandwidth required to establish all the first ten 
sessions at the same time. /i shows the lowest value for all the cases. Even though 
we chose /2 to minimize the average bandwidth of used links in each session, 
when we sum all the sessions, /2 results in the highest values. Between them 
lie the values for simulcast and layered video. When the number of requests is 
small (10 requests), the average bandwidth used by layered encoded distribution 
is greater, but for larger number of requests it is surpassed by the values of 
simulcast. 

Fig. 7 shows the maximum number of simultaneous sessions up to 15 that 
could be set up using each one of the three methods, i.e., the proposed algo- 
rithm, simulcast and layered distribution. The results show performance in the 
following order, from better to worse: the proposed algorithm using /i, the pro- 
posed algorithm using / 2 , layered transmission, and simulcast. There were some 
few cases in which our proposed algorithm was surpassed by the layered video 
approach. We expect this to occur when we have the same stream with different 
qualities over the same link, congesting it as it occurs in simulcast. This occurs, 
for example, when we have several clients connected to a non-active node that 
request different quality streams. 

Even when the location of senders are concentrated in a region of the network, 
the advantage of /2 is relatively small although results are not shown in this paper 
due to space limitation. With /2 we expected to increase the number of possible 
simultaneous sessions, reducing the bandwidth used per link, at the expense 
of increasing the number of used links. However, /2 only increases greedily the 
number of used links in the tree, sometimes misplacing the filtering location. 
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5 Summary 

We presented a framework for multicasting video to a heterogeneous group of 
clients, considering a network in which active nodes can perform filtering of the 
original video stream to satisfy different quality requirements. In our approach, 
all the quality requests are collected and the video server infers a multicast 
distribution tree prior to the video transmission. 

We then presented an algorithm for electing the filtering nodes in this dis- 
tribution tree, which aims to minimize a function / that can be set to consider 
some network parameters, to achieve efficient use of the network resources. We 
evaluated our algorithm choosing two simple definitions for /: the total band- 
width used, i.e., the sum of the bandwidth used in each link, and the average 
bandwidth of used links. We compared our algorithm with other two methods 
of distributing video that not consider the use of active nodes: simulcast and 
layered encoded distribution, and found that using our algorithm we can set up 
a greater number of simultaneous sessions, meaning a more effective use of the 
available bandwidth of the network, but at the expense of requiring processing 
capability at the network nodes. 

Future research topics include: the election of other definitions for / to im- 
prove the distribution tree, consideration of the effect of delays introduced at 
the filtering nodes, and the analysis to use reception feedback to dynamically 
modify the multicast tree after the beginning of the video transmission. 
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Abstract. When group applications such as virtual environments, 
multiplayer games and battlefield simulations are distributed, they 
generate communication needs that are not met by today’s 
communication infrastructure. The current Internet infrastructure is 
built around point-to-point best effort packet switching design for 
unicast applications, whereas the above family of applications require 
reliable, low latency, ordered multicast. The introduction of Active 
Networking resources into the switching fabric opens the door to a new 
genre of protocols that are better able to meet these needs. By using the 
services provided by this new infrastructure we have created a novel 
protocol, ATOM, which addresses the needs of such applications and 
provides a totally ordered reliable multicast service with optimal 
fairness. We begin this paper by exploring the context of this 
application domain and the relevant technologies. We continue by 
showing the ATOM design. We finish by comparing ATOM against 
other approaches available and show the improvements that can be 
gained through its use. 



1 Problem Context 

Distributed applications have communications needs that are not meet well by the 
current networking infrastructures. Real time applications such as multiplayer games 
and virtual environments need high speed, reliable, ordered communication between 
all of the participants. Work has been done to piggyback these facilities onto current 
networks with varied success e.g. ISIS, HORUS, Amoeba and Psync [1,2, 3,4]. The 
reason for this is that the current network infrastructure is built around a best-effort 
point-to-point model. The switching fabric within the network is very good at simple 
packet forwarding and we can now switch gigab 5 des of data through a switch each 
second. We can effectively communicate data at high speed with little loss between 
two parties. Whilst there may be sufficient resources to distribute multipoint data 
from m senders to n receivers through AxM different connections, conventional 
wisdom dictates it is desirable to use multicast to reduce packet copies and increase 
bandwidth efficiency. 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 171-179, 2000. 
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IP Multicast [5,6] offers multicast services over an IP network. Packet forwarding 
is best-effort meaning that packets can be lost or reordered as they travel through. 
Naively attempting to implement loss recovery over IP Multicast offers many 
problems. Using negative acknowledgement based schemes may generate a NACK 
Implosion, where the sender is overloaded by the requests for the lost packet to be 
resent. Approaches such as Scalable Reliable Multicast (SRM) [7] attempt to fix this, 
by allowing for loss recovery with localised sub-areas of the network. This recovery 
leads to high latencies if the group is sparsely distributed over the network topology, 
since recovery only uses the group end-systems. FEC schemes such as [8] which use 
source coding techniques to improve redundancy suffer from equivalent encoding 
delays. If we allow the nodes within the network to participate in the delivery of 
packets, then we can reduce latency by delivering from the point of loss. 

We also need to consider the issue of packet ordering. In a totally ordered multicast 
(sometimes known as an atomic broadcast) all messages are delivered to all interested 
parties in the same order. If all the group members have synchronised clocks, then 
total ordering is trivial through the use of timestamps and an ordering over sender ids 
to distinguish between simultaneous timestamps. In the absence of synchronised 
clocks, alternative mechanisms must be used. 

Total ordering is easily implemented with a sequencer as in [9]. In a sequencer- 
based protocol, messages are sent by a sequencing node who places a sequence 
number on the message then broadcasts the message to all the members of the group. 
The sequencing node could either be a member of the group, a separate process or the 
responsibility could even be passed around the group. There are two possible 
implementations: ordering the right to transmit or ordering the messages at some 
central location. Token passing orders all the members of the group, and ensures that 
each knows the current global sequence number. The participants are ordered in a 
virtual ring. A token, containing the current sequence number, passes round the ring 
and when the token is received, the receiving process can then send messages before 
updating and passing the token on. Server based implementations have a fixed point 
that acts as the sequencer. All messages are sent to the server, which adds a sequence 
number and then broadcasts the message to all group members. If the sequencing 
node must have more state than the sequencing id, e.g. because there is some 
additional processing required for the application, then the server based model is often 
more appropriate, since the token may become very large. 

Token passing ensures fairness of access to the sequencing token but suffers from 
the fact that the sequencer may spend most of its time traveling between the group 
members without doing any useful work. Access to the sequencer becomes more and 
more limited as group sizes increase. Token passing works well for small high-speed 
networks with few members, but fails when either the distribution or the number of 
members increase. Server based implementations have a separate set of problems. 
The first is that distance from the sequencer becomes a fairness factor. If two 
members are disparate in their distance from the sequencer, the closer of the two will 
not only get their messages sequenced first but will get earlier access to all of the 
sequenced messages. A second problem is bandwidth overload. The server must 
receive all messages from all of the senders and process them. The network link to the 
sequencer may become flooded and access will be limited. There are alternative 
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techniques based on identifier negotiation such as ABCAST [2] which require 
multiple rounds of negotiation before deciding upon an identifier, and are unsuitable 
for low latency wide area multicast. 

As can be seen, these total ordering models appear to break down, by becoming 
unfair or uneconomical, under conditions of wide distribution with differing latencies 
for each member. Active Networking techniques have much to offer in improving this 
situation, making total ordering sensible and fairer over much more varied group sizes 
and distributions. Active network techniques can be used to solve these problems 
through providing processing within the network. 

There are many varied interpretations and approaches to what active networking 
AN is and how it should be implemented [10-14]. At its base, AN is the addition of 
processing power and soft-state storage to the nodes within the switching fabric. AN 
technology allow packets to be manipulated on-the-fly based on their state and the 
state of the network which they are passing through. The manipulation occurs either 
by code pre-loaded onto the switch in some way, or by code actually carried by the 
packet along with its data. AN not only allows new protocols to be developed and 
rolled out quickly, but also opens the door to a whole new genre of protocols. These 
new protocols have the ability to manipulate packets based upon state-information 
stored at each node the packet passes through as well as information stored in the 
packet. Packets can thus be modified, destroyed, combined, split etc. as they pass 
through the network. This breaks many of the laws of current packet-switched 
networking and its consequences have not yet been fully explored. 

Applications requiring low-latency, ordered, reliable, group communication need 
processing and storage support within the network infrastructure. Active networking 
can provide these facilities within the networks. With these additional services, it 
becomes possible to break the current paradigm and design a new breed of protocols 
using the established canon of distributed computation work as in [15]. ATOM is one 
such protocol as outlined below. 



2 Protocol Description 

ATOM is an atomic broadcast protocol, in which the sequencing node is auto- 
configured within a shared multicast tree so as to provide optimal fairness. The 
shared tree is built using techniques such as described in CBT [16]. Join messages are 
sent to some well-known root for the tree, and processing and state at the intermediate 
nodes maintain the tree over time. 

In designing ATOM, we decided that a good measure of fairness was the 
minimisation of the sum of the squares of the distance from any node to the sequencer. 
By minimising this sum, we have an optimal placement of the sequencer. ATOM thus 
uses the following cost function to determine which node in the tree should act as the 
sequencer for each given node. In the equation N is the number of members and d(i) 
represents the distance between the given node and member i: 
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The tree is assumed to be a shared multicast distribution tree with no circuits. 
Under these conditions this parabolic weighting function gives a single minima, 
except where the minima is halfway between two of the nodes (each would have the 
same cost). ATOM is designed so that the sequencer is self-optimising. During the 
lifetime of a group the sequencer will move to the position in the network with the 
lowest cost. 

Weightings are passed from the fringe towards the sequencer. Because of the 
parabolic nature of the cost function, we only need to be able to calculate the cost 
value at the sequencer and its immediate neighbours to determine if the current 
sequencer is at the minimum. If not, then the sequencing responsibility is passed to 
the lowest cost neighbour. The equations to work this information at node i out are 
very simple; from the cost at one node we can infer the cost at the next node using the 
equation below. In this equation NoMemi.] is the number of group members down 
the branch, NoMemi.] is the current number of leaf nodes from the previous node 
(This is not modified when passing on). SumDisti.] is the sum of the distance reported 
from all of the downstream inner nodes and SumDistj is thus the distance to all 
downstream nodes at i. SumSquareSi is the sum of the sums of the distance to each 
member squared reported by downstream inner nodes. N is the number of members at 
the current node. 



NoMem^ = NoMem._^ + N ( 2 ) 

SumDist. = SumDist-_^ + NoMem^ 

SumSquares- = SumSquares + 2SumDist-_^ + NoMem. 

_ SurnSquareSj 
NoMem. 

Costs from sub-branches that connect at a node can simply be summed to give the 
equivalent value as if all the branches were collapsed into one. Only joining or 
leaving of a member affect these weights. 

The protocol is designed to work efficiently over wide area networks. With this in 
mind it was decided to use a sparse mode spanning tree for group information and 
packet distributions. This infers that a node in the network explicitly enters or leaves 
a group, at any time (except during partition) all of the nodes of the group are 
connected directly to other nodes within the group, i.e. all joins and leaves are 
explicit. Thus the group must have a point which it can explicitly connect to which is 
known to be part of the group. We will call this point the rendezvous point (RP). 
Once all of the nodes between a member and the RP are in the group the member can 
be considered part of the group. This ensures connectivity between the group 
members. It is assumed that if a member cannot connect itself to the RP it can not join 
the group. For this to happen the connections between the nodes must be reliable, we 
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assume that this is either implemented with timers or some appropriate 
communications sub-system. 

So, at any time we can see a totally connected sub-set of nodes within the network 
topology that represent the current group. From here we can describe the basic events 
that occur during the group lifetime. After which we will explore error recovery from 
partitioning and node failure and how these are handled. 

Member Join: When an application joins a group it signals its interest as a message 
to the local active node. This node does one of two things; if it is already in the group, 
it updates its weightings and passes this information towards the sequencer. If not, the 
node attempts to join the group. It does this by sending a join to the node that is the 
next hop towards the RP. This node then repeats the sequence, if it is already in the 
group it updates it weightings, if not it also attempts to join the group. This process 
continues until either a node that is already part of the group is reached or until the RP 
is reached. If the RP is not currently part of the group it initialises a sequencer and 
generates an optimization event. 

Member Leave: When an application leaves the group, it informs its local active 
node which then removes it from the active set of members. This node then checks to 
see if it should still belong to the group, if so it updates it weighting, if not it leaves the 
group. These leaves happen recursively until a node is reached that should remain in 
the group or until the RP is reached. If the RP has no members then it shuts down and 
frees the sequencer resources. 

Data Message: A data message sent by one of the members is forwarded by each 
node it encounters towards the sequencer. When the sequencer is reached, it adds a 
sequencer number to the packets and multicasts the message back down the tree to all 
of the members If for any reason message arrive at a receiver out of order they can be 
resequenced by the receiver using a holdback queue. 

Weight Update: Weight updates recursively update the tree until the sequencer is 
reached. The sequencer then re-calculates its position. If one of the neighboring 
nodes has a lower cost value the sequencer initialise a sequencer move to that 
position. If the two nodes have an equal value then the sequencer prefers the one 
closer to the RP (meaning that it tracks toward the RP when there are no members). 

The protocol must also take into account the possible of failure and partitioning 
within the group. The approach to this taken by ATOM is simple. If the group 
becomes partitioned then the partition that cannot reach the sequencer is killed off; 
members in the disconnected partition will be informed and can explicitly attempt to 
re-join the group. Members in the connected partition on the connected (to the RP) 
side of the break are informed and are kept in the group. This is a very simple 
approach but gives continuing service to connected members. It is designed so the 
application layer can take specific action in event of such major failure. 



3 Implementation 

From the protocol specification a testbed implementation was designed and built. 
This implementation was built in a simulation environment using several public tools. 
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NetSim [18] is a network simulator similar in function and purpose to NS [19], It is 
written entirely in Java and is targeted towards the testing and implementation of 
active network systems rather than the testing of traffic flow. This tool, like NS, 
generates uses the LBL Nam network animator [20] to show the traffic flow during a 
simulation. 

Active Network protocols are implemented as Java classes which inherits from a 
standard packet handler class. This is very similar to the approach taken by ANTS 
[11] but ignores the loading of the protocol across the network. It was decided to 
ignore other approaches which loaded the packets with the handling code (such as 
Switchware [12]) because of the increase to the packet sizes that would have ensued. 
To aid implementation, an active communication framework sub-system was used 
called AFrame [17]. AFrame provides reliable ordered communication between 
neighbouring node within the network. AFrame also keeps track of whether a 
neighbour has failed and provides communication information (such as round-trip- 
times). This makes implementation much easier as simple packet loss can be ignored. 
Only when communication problems become chronic does a protocol need to step in 
and take action. 



4 Simulation and Evaluation 

We have carried out a number of experiments in the design and refinement of ATOM 
and AFrame. For the purposes of this paper, we will use the results from a single 
experiment to illustrate how ATOM achieves near optimal placement of the 
sequencer. 




Time (s) 



Optimal 

ATOM 

— ■ — Worst 
Edge 



Fig. 1. ATOM vs. Optimal, Worst and Edge Weightings 
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The experiment that we carried out shows the optimisation of ATOM compared not 
only to the optimal costs, but also to a fringe based server. This experiment was 
carried out on a balanced tree where all of the members were at the fringe of the 
network (I.e. at the leaf nodes). To show the optimising effect the members were 
configured to join and leave in a tidal manner moving from one sub-tree to another. 
For this experiment we are interested in three sets of values, firstly those of the 
optimal cost values within a network, secondly the costs of the placement of the 
sequencer by ATOM and lastly the costs generated by a fringe based server (position 
fixed at the edge of the network). 

In this experiment we can calculate the values of the fringe and optimal costs for 
the given network configuration (i.e. topology and membership) whereas the ATOM 
values are taken from the simulation. Figure 1 shows a typical experimental run 
results. Note that the fringe value is widely different from the optimal and ATOM 
results. If all of the members of the group were near this point the correlation would 
be closer but with a broadly distributed group the performance is very poor compared 
to a localised group. By comparison, ATOM tracks the changes of the optimal line 
closely, whenever the group changes ATOM moves the sequencer towards the optimal 
position. This is not instantaneous. We have not only the communication latencies 
but we have a designed measure of hysteresis, stopping the sequencer oscillating 
widely if the group changes rapidly. 

Figure 2 shows a closer view of ATOM values compared to the optimal values. 
The behavior of the protocol can now be seen much more clearly. When the optimal 
position changes the ATOM sequencer moves on a hop-by-hop basis toward the 
optimal position. This can be seen with the ‘stair-stepping effect - each step shows 
the movement of the sequencer between one node and the next. The width of the step 
is not only the communication latency between the two nodes, but also the built in 
hysteresis. Unless the group is changing continually ATOM will eventually reach the 




Fig. 2. ATOM vs. Optimal 
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5 Conclusions 

In this paper we have shown how active networking can provide support for the 
implementation of distributed computation. In particular, we have presented a novel 
algorithm for optimising placement of a sequencer for an atomic broadcast algorithm, 
and shown how the algorithm can be integrated within a simple shared tree multicast 
algorithm. 

It is important to note that the square of the distance is not the only possible cost 
function. As long as the cost function is exponential we can use any measure that is 
appropriate. Another good example would be the minimisation of bandwidth using a 
hmction similar to the one below. Where h(i) represents the hop-count to the 
sequencer and p(i) represents the proportion of traffic generated as a Ifaction of the 
total traffic, and N represents then number of members in the group. The hop-count 
needed to broadcast the message is a constant and can be ignored. 

^ (3) 

c = ^ 

N 

These functions can be dynamic. Group membership or bandwidth usage (e.g. 
video-conferencing) can change dynamically over time. Allowing the sequencer to 
move during the lifetime of the group allows the sequencer placement to be 
continually optimised. 

ATOM has been compared against its closest competitors and has shown itself to 
be superior for this group of applications. The addition of active services within the 
network breaks the old model of network commimications and allows for new families 
of protocols. These protocols can do several things including self-organising and 
optimising, although work is still required on how much power and storage should be 
made available and how it should be utilised. ATOM is a demonstration of the 
potential power of such approaches to break old paradigms and create new ones. 
Work is currently being undertaken to determine how to adapt the degree of hysteresis 
to group size, and on how to integrate alternate cost functions. 
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Abstract. We describe an active application in the field of multicast 
congestion control for real-time traffic. Our Active Layered Multicast 
Adaptation Protocol is a layered multicast congestion control scheme 
built on top of an Active Network infrastructure. It benefits from router 
support in order to obtain information about resources available and to 
perform the adaptation tasks at the places where shortage of resources 
occur. It supports heterogeneous receivers through the combination of 
layered multicast transmission with selective filtering and pruning of lay- 
ers within the active nodes. Market-based resource management ideas are 
applied to achieve a resource utilisation level that represents an equilib- 
rium between the user goals and the node operator goals. Our simulation 
results show that the protocol is feasible and provides adequate reactions 
to short term and persistent congestion, while keeping the amount of 
state and processing in the active nodes limited. 



1 Introduction 

We are interested in the use of active networks (ANs) [4,22] in the context of 
adaptive multicast protocols. Since no single solution for multicast seems able 
to satisfy all applications, it seems more natural to adopt an approach in which 
multicast protocols can be dynamically loaded according to the needs of the 
applications and users. Active networks are especially suited for this task. 

An adaptive protocol must be able to make optimal use of the available re- 
sources, and to accommodate fluctuations in resource availability. Here again, 
active networking can play an important role, since it becomes possible to in- 
ject customised computations at optimal points in the network, to facilitate the 
adaptation process. 

In this paper we describe an active application (AA) in the field of multicast 
congestion control. We consider a soft real-time, distributed multicast applica- 
tion composed of sources and receivers that exchange capsules containing mobile 
code which is executed in the active nodes along the path. Our Active Layered 
Multicast Adaptation (ALMA) protocol supports heterogeneous receivers, i.e. 
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receivers that have different terminal characteristics or experience different net- 
work conditions, by making use of layered multicast [17]. In a layered multicast 
scheme, the source divides its data into a number of layers in a hierarchical way, 
such that the most important information is carried by the lowest layer (base 
layer), and the higher layers successively enhance this information. 

The rest of the paper is organised as follows: Section 2 provides some back- 
ground on research related to this work. Sect. 3 describes the ALMA AA, Sect. 
4 shows some results obtained so far, and Sect. 5 concludes the paper. 

2 Background 

2.1 Resource Management 

One of the main fears towards active networks is the time dedicated to mo- 
bile code processing: this seems incompatible with the increasing bandwidth 
availability in the network backbone, which requires faster and therefore less in- 
telligent routers. However, the access networks do not follow the growing speed 
of the backbones. There, heterogeneity is becoming even more prominent, in- 
cluding fixed and mobile dial-up access, ISDN, xDSL, cable networks, etc. A 
price measure seems to be a fairly good reference to reflect these differences 
in availability and cost of resources. Applying simplified economic principles of 
offer and demand, it is possible to make the price indicate the availability of a 
given resource. Note that the price is only an arbitrary reference that enables 
comparisons and trading between different types of resources, thus docs not nec- 
essarily translate to real-world prices. For example, an application could trade 
CPU time for bandwidth consumed while crossing a high-bandwidth backbone 
(where CPU processing is expensive and bandwidth is cheap), while it could 
favour CPU consumption where bandwidth is scarce (thus expensive) and CPU 
time cheap. 

This gives rise to a market-based model for resource allocation. In such a 
model, two opposite forces act to seek the global system equilibrium by opti- 
mising their own benefits: on one side we have the network elements, whose 
interest is to maximise resource usage (since that brings them revenues) while 
maintaining a good performance level (in order to keep the clients satisfied). On 
the other side, we have the users (active applications), who seek to obtain a bet- 
ter quality/price relation for the resources consumed, and to efficiently manage 
their own budgets avoiding waste. 

Several algorithms inspired on operational research and economy theories 
have been proposed to control resource usage in networks [7]. These algorithms 
are able to converge towards a globally optimal resource allocation in a decen- 
tralised way. In [9,14,15] such theories are applied to the problem of end-to-end 
congestion control, i.e. where bandwidth is the main scarce resource. In [9] an 
optimisation framework is used to derive a class of optimal algorithms inside 
which TCP, after some modification, can be seen as a special case. In [14] a 
thorough stability and fairness analysis of some optimisation-based rate control 
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algorithms is presented, and it is shown that these algorithms implement propor- 
tionally fair pricing. In [15] a similar algorithm is proposed, and in a more recent 
work [1] it is adapted to the Internet environment, using a packet tagging scheme 
to communicate link price information to the end hosts. The results shown are 
promising since they are generic enough to be adapted to a wide variety of appli- 
cations. However, their direct application to discrete layering multicast schemes 
such as the one we present in this paper, is not straightforward due to fairness 
issues, as pointed out in [20]. 

Closer to the AN perspective, in [23] an open resource allocation scheme 
based on market models is applied to the case of memory allocation for mobile 
code. In [11] an adaptive QoS scheme for MPEG client-server video applications 
is described. It is based on intelligent agents that reserve network bandwidth 
and local CPU cycles, and adjust the video stream appropriately. In [26] a mar- 
ket model to allocate QoS is applied to a conferencing tool targeted at casual 
meetings where sudden variations in bandwidth availability require an adaptive 
QoS control strategy. 

A cost model for active networks is proposed in [18], which expresses the 
trade-off between different types of resources in a rpiantitative way. However, 
due to the recursive approach adopted, the model seems more appropriate for 
applications that make use of resource reservation, instead of highly adaptive 
ones. Furthermore, the use of such model with multicast needs to be clarified. 

2.2 Multicast Congestion Control 

Congestion control can be regarded as a special case of resource management 
which considers mainly bandwidth as a scarce resource. It has been mentioned 
several times as a potential area of application that can benefit from active 
networks. End-to-end congestion control is an extremely difficult problem, par- 
ticularly on the Internet where it is aggravated by the absence of information 
about the network. In an active network, capsules can collect information about 
network conditions [5] and active filters can be installed to adapt between por- 
tions of the network subject to different conditions [13]. But to which extent 
such an additional information is helpful, or the extra burden it might bring, are 
still open questions. 

Multicast sessions can be either single rate, if a single flow is generated for 
all the receivers, or multirate, when several flows are generated and the receivers 
get a subset of flows according to their preferences and constraints. If these flows 
are alternatives to each other, we have a simulcast scheme. If they complement 
each other in a hierarchical way, we have a layered scheme. Layered multicast is 
typically proposed for video, but it has a potential for any kind of application 
that can divide a single flow into subflows of different priorities, e.g. an animation 
stream, bulk data transfers, etc. 

Multicast congestion control schemes can be divided into two classes: pure 
end-to-end schemes, and router-assisted schemes. In a pure end-to-end layered 
multicast scheme [17,24], the source transmits a hierarchically encoded stream, 
and the receivers subscribe/unsubscribe to a number of layers according to the 
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observed network conditions (loss rate, etc.). The RLM protocol [17] is one of 
the pioneers and most well-known protocols in this category. It is based on the so 
called join experiments to probe for available bandwidth, and on the observed 
loss rate to abandon layers in case of congestion. RLC [24] is a more recent 
protocol that also addresses friendliness towards TCP flows on the Internet. 

The existing end-to-end layered schemes achieve very good results consider- 
ing the limitations of pure best-effort networks which make most of the Internet 
today. However, they have many drawbacks such as: slow and/or coarse-grain 
adaptation; unstable behaviour characterised by subscribe/unsubscribe oscilla- 
tions; the need to allocate and manage several multicast groups; random packet 
drops that lead to poor quality due to hierarchical dependence among pack- 
ets from different layers; probing for additional bandwidth has a potential to 
intensify congestion; difficulty to synchronise among receivers leading to under- 
utilisation of bandwidth. 

Recently, more attention has been devoted to router-assisted schemes [3], 
that can count on router support in order to solve the abovementioned problems. 
In [10] an extension of RLM is proposed, which makes use of two priority levels: 
the upper layer, which is less important than the others, is always assigned a low 
priority. This creates a “bumper layer” that absorbs most of the packet losses. 
Unnecessary join experiments are eliminated in this way, leading to a very stable 
behaviour even when the layers have a variable bit rate profile. 

In [16] a router-assisted congestion control scheme for reliable multicast is 
sketched. The authors propose a very interesting filtering scheme based on a nor- 
malised inverse sequence number which is mathematically the smoothest scheme 
possible. They assume that the router computes the fair shares for all the ses- 
sions. Their signalling scheme is similar to ours, but an explicit rate is used 
instead of layers. 

For a router-assisted scheme to be useful, it needs to be widely accepted and 
deployed, or at least deployed in critical points in the network, which anyway re- 
quires a long standardisation process and deployment time. Since most multicast 
sessions are naturally heterogeneous, it is difficult to agree on a single solution 
or set of solutions. That is where active networks can play a role, since we can 
design customised solutions and let them evolve through usage experience. 

In [6] a unicast scheme for AN-based congestion control is presented, tar- 
geted to improve TCP performance over links presenting high bandwidth delay 
product. In [12] a single rate reliable multicast scheme with congestion control is 
presented in the form of an active service. Within this scheme, the source adapts 
its sending rate to the rate that can be supported by the “nominee”, which is 
the weakest receiver in the session. In [2], a layered video multicast protocol is 
implemented over two AN platforms: ANTS and MO. Although the goal of that 
work was to compare the performance of the two AN platforms, it is important 
to notice that their signalling scheme to join layers is identical to ours. 

A lot of AN example applications are related to information filtering or 
transcoding in the active nodes [13,21]. The common idea to these examples 
is to show how more intelligent functionality such as selective packet discard- 
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ing and transcoding in the routers can help improving the reception quality of 
a multimedia stream. In all these examples, the source of congestion does not 
react in order to prevent overflow and discarding. Filtering in the intermediate 
active nodes alleviates downstream congestion, and is important for heteroge- 
neous multicast, since it enables the adaptation of one original stream for several 
different groups of receivers. However, hltering alone is obviously not a solution 
to the congestion control problem, since it does not tackle the source of the 
congestion. 



3 ALMA Protocol 

Active Layered Multicast Adaptation (ALMA) is a multicast congestion control 
protocol implemented as an AA. It supports heterogeneous receivers through 
the combination of layered multicast transmission with the selective filtering 
and pruning of layers within the active nodes. In previous work [28] a case study 
was conducted, which led to an initial version of ALMA. In this paper we refine 
the protocol and present additional results. 

3.1 Requirements and Assumptions 

We would like to create a model for adaptive applications, that can benefit from 
the facility of retrieving the necessary information about network conditions 
using the router support that can be provided by an active network. It is well 
known that feedback information for congestion control must be used with care, 
in order to avoid generating extra packets that might worsen congestion. It is 
also known that state and processing in the active nodes might be expensive, 
and therefore should be reduced to the minimum necessary. We set these two 
principles as requirements for our protocol. 

Since our target is a delay-sensitive application, and for efficiency reasons, 
we also impose the requirement of not delaying packets inside the active routers. 
Additionally, the protocol should not rely on specialised schedulers that decide 
how to handle flows in a centralised way. The decision must be built into the ses- 
sions themselves, taking into account their preferences and resource availability 
information. No assumptions are made regarding the exact traffic specifications 
for each session. 

We assume a pure AN environment in which all packets are active. It is 
possible to optimise it in several ways, but for the moment we would like to con- 
centrate on the protocol functionality from a conceptual point of view. Therefore 
we try to be as generic as possible without imposing the constraints of a real 
implementation. Furthermore, we assume that all nodes are active. Although 
this assumption seems unrealistic, the use of an equivalent link abstraction [21] 
enables us to easily migrate to a mixed scenario of active and non-active nodes, 
while at the same time keeping a pure AN abstraction so that we do not need 
to worry about implementation and interoperability constraints in the design 
phase. 
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3.2 Protocol Basics 

We consider a layered multicast application composed of a single source and 
several different receivers. The source generates delay-sensitive traffic encoded 
in a hierarchical way such that lower layers are more important than higher 
layers, and the higher layers are useless without the lower layers. The generated 
data is carried by capsules which also contain the instructions to process it inside 
the active nodes along the path to the receivers. The data capsules are organised 
in layers according to the hierarchical position of the data they carry. This is 
similar to conventional layered transmission [17,24] except that in this case, since 
customisable capsules are used, it is feasible to use a potentially large amount of 
layers. Besides that, the semantics of the relationships among the layers is built 
into the scheme itself, facilitating its customisation to the specific characteristics 
of a given application. 

The multicast routing mechanism employed is a simple form of source-based 
sparse-mode scheme similar to the well-known AN multicast example from 
ANTS [25]. For simplification purposes, no group address is used. Receivers 
subscribe directly to the address of the source they wish to receive data from. 
While subscribing, a receiver specifies the subscription level it wishes to obtain, 
which indicates how many layers it wishes to receive, like in [2]. This requires 
only one type of capsule: a subscribe capsule, which takes two parameters: the 
source address and the desired snbscription level. To abandon all layers, a re- 
ceiver spawns a subscribe capsule carrying zero as the subscription level. The 
main advantage of this subscription model is that a group of streams can have 
its semantics understood inside the network nodes as layers belonging to the 
same session. Besides that, such a simple subscription model allows an arbitrary 
number of layers to be used, since the amount of multicast forwarding state 
left in the nodes does not increase with the number of layers, as opposed to 
what would happen in conventional layering schemes using IP multicast. The 
disadvantage is that the processing of a subscription request in a node is more 
complex, since it must compute the upstream subscription level for a session, 
which is the maximum of the subscription levels at each outgoing interface that 
has members of the session. 

Complete end-to-end congestion control is achieved by a combination of sev- 
eral efforts: (i) link outgoing interfaces export a price that is calculated as a 
function of the link characteristics and current load, (ii) data capsules filter 
themselves by comparing the current link price with their own budget; ( Hi ) data 
capsules prune multicast tree branches affected by persistent congestion; (iv) 
receivers spawn subscribe capsules to probe for additional bandwidth. 

3.3 Filtering Mechanism 

Link outgoing interfaces export an instantaneous price for the usage of the link. 
The price is calculated as a function of the link characteristics and current total 
link load (of all traffic traversing the link) . The price in this case is no more than 
an indication of the level of congestion at a given interface. 
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The budget assigned to a data capsule is directly proportional to the marginal 
utility of adding its layer, which in our case expresses the relative importance of 
its layer with respect to the other layers. Since the marginal utility for adding 
an individual layer decreases as we go towards higher layers, the budget of each 
capsule decreases too. Therefore, a higher layer capsule is more likely to dis- 
card itself due to a rise in link prices than a lower layer capsule. The resulting 
behaviour is similar to a priority dropping mechanism, or a weighted RED [8]. 
Such a simple filtering mechanism does not require specialised schedulers, it only 
requires the link to export a price function that varies as a function of the load 
on the link. 

Figure 1 shows an example of budget function, used to calculate the budget 
to be assigned to each layer, and an example of price function for a point-to-point 
link, inspired from [23] . These are the ones we used in the simulations described 
in Sect. 4. 




Layer number 




Load (%) 



Fig. 1. The budget function to be applied to each layer (left), and the price 
function maintained by the node interfaces (right) 



The reason to export a price and not the link load directly is that an abstract 
price function gives more freedom for the network managers to adjust its shape 
to offer different incentives to the users. This can work in a transparent way 
with different kinds of links. For example, in a shared Ethernet link the local 
interface load is not a good indication of the actual link utilisation, therefore 
another parameter should be used, such as the number of collisions. 

Such filtering is performed at the time granularity of a capsule, therefore it 
must be based either on instantaneous or short term average values (i.e. of the 
order of a few capsules). 

In the simulations presented in this paper we use instantaneous values be- 
cause it is simple and completely eliminates packet drops at the outgoing links. 
Indeed we confirmed that during the simulations there were no packet drops due 
to queue overflow at the router output queues. All the drops were due to the 
selective discarding mechanism built into the data capsules. The problem with 
this approach is that large bursts are penalised. Indeed we observed that a steep 
transition from a very fast link to a very slow one sometimes prevents sessions 
from achieving full link utilisation. 
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3.4 Subscription Mechanism 

Subscribe capsules are used by the session receivers to indicate the desired sub- 
scription level (probing), and by the active routers to control congestion (prun- 
ing). 



Pruning. When long-term congestion is detected at an outgoing interface of 
a node, the application itself (through its data capsules running in the router) 
decreases its subscription level for that interface by injecting a subscribe capsule 
with a lower subscription level into the local Execution Environment (EE). The 
subscribe capsule is processed as if it had come from the congested outgoing in- 
terface. It might cause the traffic to be pruned only for the concerned interface, 
but if no other interfaces are requesting the pruned levels, the subscription cap- 
sule travels upstream, pruning the tree at the level of the maximum subscription 
level required locally. 

One of the difficulties here is to determine exactly when to prune. Several 
criteria are possible, but we would like to find one that does not require additional 
state to be kept in the active nodes. We cannot prune too fast, in order to allow 
some bursts to go through. And we cannot delay the pruning decision too long, 
in order to minimise resource waste. Ideally we should be able to prune in less 
than a round-trip time to the nearest receiver, but this information is difficult 
to obtain, and would need additional per-session state. Instead, we rely on the 
average prices, calculated using a weighted moving average. The average price 
can be calculated and exported by the outgoing link interfaces, in the same way 
as link statistics are kept for network management purposes. 

Note that the state of the multicast tree downstream from the pruned branch 
is not immediately updated following the prune, since the subscribe capsule is- 
sued (if any) travels only upstream. In order to ensure that any stale subscription 
state is appropriately refreshed, each receiver periodically spawns subscribe cap- 
sules containing the minimum subscription level experienced along the path (this 
information is reported by the arriving data capsules). In the interval between 
a prune and a refresh, subscribe capsules containing the pruned level might be 
ignored. This might have the positive side effect of temporarily blocking requests 
for a layer that has just been eliminated, but the negative effect of delaying the 
probe process (leading to a slower adaptation). 

Probing. Since a capsule executing inside an active node has no means of 
knowing whether there are downstream nodes still interested in a layer which 
was pruned long ago due to congestion, the decision to probe for bandwidth 
is always taken by the receivers. To be able to probe, each receiver needs to 
monitor the number of layers it effectively receives, which might be different 
from its desired subscription level (since layers might have been pruned by the 
upstream routers due to congestion). After a period in which the quality achieved 
is considered good with respect to the effective number of layers received, such 
that this number is inferior to the desired one, a receiver decides to probe for 




188 Lidia Yamamoto and Guy Leduc 



bandwidth which might eventually be available. To do that it sends a subscribe 
capsule increasing its subscription level by n layers. Currently n is set to one 
for simplicity, although it is also possible (and better) to calculated it using the 
available price and budget information at the receivers. 

The decision of when to probe is a very difficult one. An ideal probe criterion 
should virtually eliminate useless probes, while at the same time allow a rela- 
tively fast reaction to grab available resources. These two goals are conflicting 
and therefore a compromise must be found. In the simulations we show in this 
paper, the probe criterion used is a combination of conditions: (i) A minimum 
interval between consecutive probes must be kept, (ii) The budget for the new 
subscription level desired must be greater than the average price observed, (iii) 
Prices must be stable enough, i.e. the receiver should observe a price variation 
under a certain threshold during an observation period, before probing, (iv) The 
maximum subscription level received must stay unchanged during the observa- 
tion period, (v) There should be no packet losses during the observation period. 
In fact this last condition is redundant when all nodes are active, because when 
the prices are stable enough there are no losses either. 

When a subscribe capsule arrives at a node, it first checks the router interface 
from where it came: if this interface shows persistent congestion (according to the 
same criterion used to prune congested branches), the subscribe capsule simply 
decides to terminate its execution. This results in a loose form of “self-admission 
control” which filters out potentially harmful probe requests. In order to avoid 
denial of service for new flows, this check is only performed when multicast 
routing state is already present for the source on the concerned interface. This 
naive filtering does not take into account the actual new bandwidth introduced 
into the system when a probe is accepted, therefore does not guarantee that an 
accepted probe will not cause congestion. This mechanism could be improved, 
for example by using explicit traffic information in probes (e.g. peak, average 
bandwidth). For the moment, however, we try to keep it simple and open for 
different types of traffic envelopes. 

3.5 Summary 

To summarise, in ALMA the decision to prune is always taken inside interme- 
diate active nodes (by data or subscribe capsules), while the decision to graft 
(probe) is always taken by the receivers. This is a way of distributing the deci- 
sion load among AN nodes, such that the right operation is performed where the 
data is available. As a result, in the active nodes the reaction to congestion is 
fast enough, using the combination of data filtering (for short term congestion) 
and tree pruning (for persistent congestion). 

Probing for available bandwidth is a task assigned exclusively to the receivers, 
therefore some delay can be expected from the moment when some bandwidth is 
released to the moment when new flows start to use it. The duration of this delay 
is a trade-off between how fast a reaction is desired and the amount of resources 
(bandwidth and active processing) we are ready to spend in the probing process. 
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In the active routers, we try to minimise the impact of probes on the downstream 
congestion level by ignoring probe requests during congestion periods. 

The feedback information is carefully used such that no extra packets are 
generated in a situation of congestion. The amount of state kept in each active 
node is of the order of the state necessary to maintain one IP multicast group. 
The scheme is therefore as scalable as any other multicast-based traditional 
scheme. 

4 Simulations 

We simulated ALMA in order to study its feasibility from the point of view of 
the adaptation to the available bandwidth and the competition among different 
sessions. 

First of all, we designed and implemented a simplified EE and a NodeOS 
module over the NS simulator [19] in order to be able to simulate the execution 
of active packets in the NS network nodes. The simulated EE is based on the 
execution of capsules written in TCL language. 

The percentage of queue occupancy is used to indicate the load on the link. 
The link prices are calculated from the instantaneous and average queue lengths 
at each outgoing interface. For the average queue length, a modified RED [8] 
queue is used. The RED queue is inactive (i.e. the maximum and minimum 
thresholds are set to the end of the queue) so that it behaves like a FIFO 
queue with a drop tail policy, and it is only used to compute the average queue 
length. This average is computed as an exponential weighted moving average 
(EWMA) [8], with its weight factor adjusted to achieve the desired convergence 
time. The RED module was also modified to compute the average when the 
packet leaves the queue, instead of when the packet enters the queue. This is 
because we want the rhythm of updates to be regular and proportional to the 
link speed instead of the input rate. 




Fig. 2. Topology used in the simulations 



190 Lidia Yamamoto and Guy Leduc 



The topology for the simulations is depicted in Fig. 2. All nodes are active. 
Links Li, L 2 and L3 are bottlenecks with capacity 1.6 Mbps, 1.0 Mbps and 0.8 
Mbps, respectively. The other links have a capacity of 4.0 Mbps each. Links L4 
and L5 have propagation delays of 100ms and 500ms respectively, while all the 
other links have 10ms. Every link has an inactive RED queue of 20 packets each. 
Each of the sources Si and S2 generates a stream with an average rate of 2 Mbps 
divided into 5 layers of equal average rate. The packet size is 500 bytes for both 
sessions. Receivers rij (j = 1..5) subscribe to the 5 layers of si at around t = Is, 
while receivers r2,fe {k = 1..3) subscribe to S2 after about one third of the total 
simulation time (i.e. around t = 20/3 w 6.6s), and leave the session the same 
amount of time later (t w 13.3s). The receivers are not synchronised, therefore 
they join the session at slightly different times. 
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Fig. 3. Evolution of the rates in time for the members of sessions 1 and 2 



Figure 3 shows the evolution of the rates in time for the first and second 
sessions. We observe that after about one second the receivers get only the 
amount of bandwidth that fits into the bottleneck links, and the sources stop 
sending the exceeding amount of traffic after about one second. When the second 
session starts, the system accommodates it, and after about a second the two 
sessions are using virtually the same amount of bandwidth. We also see that 
when the second session terminates, the receivers of the first session are able to 
reuse the released bandwidth after a couple of seconds. Note that receivers ri,4 
and ri_5 have a behaviour which is similar to ri_3, indicating that the additional 
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In the active routers, we try to minimise the impact of probes on the downstream 
congestion level by ignoring probe requests during congestion periods. 

The feedback information is carefully used such that no extra packets are 
generated in a situation of congestion. The amount of state kept in each active 
node is of the order of the state necessary to maintain one IP multicast group. 
The scheme is therefore as scalable as any other multicast-based traditional 
scheme. 

4 Simulations 

We simulated ALMA in order to study its feasibility from the point of view of 
the adaptation to the available bandwidth and the competition among different 
sessions. 

First of all, we designed and implemented a simplified EE and a NodeOS 
module over the NS simulator [19] in order to be able to simulate the execution 
of active packets in the NS network nodes. The simulated EE is based on the 
execution of capsules written in TCL language. 

The percentage of queue occupancy is used to indicate the load on the link. 
The link prices are calculated from the instantaneous and average queue lengths 
at each outgoing interface. For the average queue length, a modified RED [8] 
queue is used. The RED queue is inactive (i.e. the maximum and minimum 
thresholds are set to the end of the queue) so that it behaves like a FIFO 
queue with a drop tail policy, and it is only used to compute the average queue 
length. This average is computed as an exponential weighted moving average 
(EWMA) [8], with its weight factor adjusted to achieve the desired convergence 
time. The RED module was also modified to compute the average when the 
packet leaves the queue, instead of when the packet enters the queue. This is 
because we want the rhythm of updates to be regular and proportional to the 
link speed instead of the input rate. 




Fig. 2. Topology used in the simulations 
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the delay on the subscription levels observed at receivers ri^ 3 , ri_ 4 , and ri^s is 
relatively small, confirming what Fig. 3 also shows us. 
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Fig. 6. Percentage of data capsules discarded for (left) and ri _2 (right) 



Figure 6 shows the filtering behaviour for each layer, again taking ri,i and 
ri ,2 as samples. In this figure, the percentage of data capsules lost, as calculated 
by each of the receivers over an interval of one second, is plotted. Since queue 
overflow does not occur, and link errors are not simulated, all the losses plotted 
correspond to filtered packets. If we compare with Figure 5, we can see that the 
peaks of loss that a given layer suffers correspond to instants immediately before 
the layer is pruned. Note that the losses observed at t w 15.5s are due to the 
unsuccessful probe previously sent by ri ^2 (observed in Fig. 5 (left)). 



5 Conclusions and Future Work 

We presented an active multicast adaptation protocol that addresses receiver 
heterogeneity not only by performing selective filtering on data packets but also 
by pruning unused multicast tree branches and probing for available bandwidth. 
The protocol uses source-based reverse path routing, does not need multicast 
groups, and is based on hierarchical layers that are built into its capsule types. 
It is independent on scheduling algorithms - a simple FIFO queueing is enough 
- but counts on the ability of the active nodes to export link parameters that 
can be used in the adaptation process. 

The protocol has shown to be a feasible A A, and it motivates us to pursue 
further research in this direction. The next immediate steps are the simulation 
and implementation of the mechanisms described over more complex and realistic 
AN environments consisting of multiple node and link types, as well as in a fully 
heterogenous network in which some of the routers are active while others are 
not. Other topics for future research include: improve the reactiveness of ALMA 
in scenarios presenting multiple short duration sessions; cover routing issues such 
as the scalability to a large number of sessions with few receivers; trade several 
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types of resources; address budget assignment and management issues within a 
session; dynamically deploy price functions. 

The task of performing adaptation did not turn out to be trivial even when 
using all the AN support wc could imagine. Basically the same trade-offs of 
classical control algorithms remain, such as stability versus reaction time, feed- 
back availability versus bandwidth consumption, etc. On the other hand, we can 
easily choose where to place a given functionality (e.g. pruning of unused tree 
branches), and therefore try to place it as close as possible to the data it needs 
(in this article, congestion information). The resulting protocol is able to share 
the decision load among all active nodes involved in the session, with most of 
the complexity still at the receivers, due to the task of demultiplexing and de- 
coding. The amount of state and processing in the active nodes is kept as small 
as possible, and does not increase with the number of layers used. 

As a side effect, a slightly different model of active node pops up: a model in 
which the intelligence is distributed among the active applications representing 
the user interests, and the active resource managers representing the network 
provider interests. These different types of agents cooperate to seek configura- 
tions which are both locally and globally optimal with respect to the resources 
used, user satisfaction and provider revenues. In [27] we have developed this 
model further, and we are now redesigning ALMA based on those results. 
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Abstract. Policy-based networks can be customized by users by 
injecting programs called policies into the network nodes. So if 
general-purpose functions can be specified in a policy-based network, 
the network can be regarded as an active network in the wider sense. In 
a policy-based network, two or more policies must often cooperate to 
provide a high-level function or policy. To support such building-block 
policies, two architectures for modeling a set of policies have been 
developed: pipe-connection architecture and label-connection 

architecture. It is shown that rule-based building blocks are better for 
policy-based network control and that the label-connection architecture 
is currently better. However, the pipe-connection architecture is better 
in regards to parallelism, which is very important in network 
environments. 



1 Introduction 

Active networks are networks that are customizable by users, and their behavior can 
be modified by injecting programs. An appropriate first step toward active networks 
is to build an extensible policy-based network, because such a network can be 
customizable by users by deploying policies and because programs, which are called 
policies, can be injected into the nodes in a policy-based network. For example, QoS 
(Quality of Service) policies, which may he device-dependent, can be deployed to 
QoS-ready routers, so the network is customized to each user. Each user can have 
their own virtual network with customized QoS parameters such as a specific 
bandwidth or delay. The function of policies has been limited to a certain area, e.g., 
QoS or security, hut it can be extended. If it is extended and the policy-based 
network becomes general-purpose, it becomes an active network. 

In policy-based networks, two or more policies must often work together. For 
example, in Diffserv, a policy for marking a DSCP (Diffserv Code Point) and a policy 
for queue control, which operates on the same packets, must cooperate. They must 
cooperate because the latter tests the DSCP that the former marks. This is a simplest 
case of cooperation, but there are probably more complicated cases. A higher-level 
function or policy of the network is provided by a combination of lower-level 
functions or building-block policies. 
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To build a building -block-based policy system, Kanada proposed the first version 
of a rule-based component architecture as a MIB [Kan 99] [Kan 00a], and he also 
developed an architecture based on a logic programming language [Kan 00b]. In 
these architectures, a policy is constructed using rule-based building blocks, or small 
policies, which are connected by virtual flow labels [Kan 99] or logical variables 
[Kan 00b]. The first architecture was restricted to a QoS domain and the building 
blocks were built-in there, but the second architecture was general-purpose and 
building blocks could be created by combining preexisting building blocks. 

In the current work, two new huilding-block architectures based on the above two 
architectures have been developed. One is a pipe-connection architecture, which is a 
refinement of the second architecture, and the other is a label-connection architecture, 
which is a generalized version of the first architecture. In Section 2, the technical 
requirements for policy-based networking are investigated and the reason such 
architectures are required is explained. The two building-block architectures are 
described in Sections. Examples of DiffServ configuration for routers are given in 
Section 4. The two architectures are compared in Section 5. 



2 Why Rule-Based Building Blocks? 

Policy-based networks are originally developed for reducing the complexity in 
configurations of a network and its nodes. Policies are replacements of vendor- and 
device-dependent configuration commands, and they will soon be standardized by the 
IETF (Internet Engineering Task Force) Policy Framework Working Group. Policies 
are derived from SLSs (service-level specifications). An SLS is a specification 
regarding the behavior of the network and it is derived from an SLA (service-level 
agreement), which is a contract between a network operator and a user or between 
two or more network operators. 

There are five technical requirements regarding policy-based (active) networks. 
The first requirement is that an SLS should be translated into policies easily by hand 
or mechanically if the SLS is simple. An SLS is described declaratively, but not 
procedurally, by using a natural language or a formal language. If a policy depends 
on the specific procedure that implements the required function, it is not easily 
generated from the SLS. So the policy should be declarative. In particular, a policy is 
usually considered as a collection of if-then rules such as 

if (condition) action; 

because it is usually considered easier to translate an SLS into if-then rules. 

The second requirement is that a policy must be executable. A policy is not just 
data but a program because it changes the behavior of a network or network nodes. 
So when copies of the policy are deployed to network nodes, they must be exeeutable. 

The third requirement is that dynamic modification of a policy must be possible. 
Beeause the network never stops, a policy is often modified while it is being used. So 
a policy must be modular. If a policy consists of mutually independent rules, a rule 
can be added, modified, or removed without affecting other rules. 
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The fourth requirement is that, even if the SLS is complicated, translating the SLS 
into policies must be possible. A complicated SLS should be expressed in a 
structured form, so the policies that are derived from the SLS should also be 
structured. Thus, there should be means that structure the policies. This means a 
policy should be constructed from components or building blocks. 

The fifth requirement is that an optimized policy should be expressible by using 
the same architecture. A naively expressed policy may be inefficient. Such a policy 
should be optimized automatically or by hand. Both the original policy and the 
optimized policy must be represented by the same language. Otherwise, it is difficult 
to prove they are semantically equivalent. 

A method that meets these requirements is to represent policies by a rule-based 
building-block architecture. A possible translation process from SLS to device 
configurations through rule -based building blocks is illustrated in Fig. 1. 




Fig. 1. Service- to device-level policy-translation process 



The first three requirements can be satisfied by using rule-based models or 
languages that are similar to Prolog or OPS5 [For 81]. Defining a policy declaratively 
and translating it into an executable program is difficult if the policy is complicated. 
However, such translation is easier if the policy representation is properly selected. In 
the field of artificial intelligence, knowledge representations were extensively studied 
in the 1970s and 1980s, and rule-based programming languages, espeeially logic 
programming languages (such as Prolog) and languages for developing production- 
system-based expert systems (such as OPS5), were developed. These languages are 
declarative and executable at the same time. We can apply the results of such 
research to policy-based networking. In the languages, rules can be written as 
mutually independent rules; i.e., rules ean be defined such that only one rule can be 
applied in any specific situation even if the order of rules is changed. 

The fourth and fifth requirements can be satisfied by using a component-based 
architecture. A complicated poliey can be expressed by using building blocks. 
Complexity ean be reduced by defining a larger building bloek as a collection of 
building blocks. Both primitive and composed building blocks are rule-based and 
follow the same semantics if a logic programming language is used. A policy can be 
optimized by program transformations. If the optimization is local, it is done by 
replacing a set of building blocks by another set of building blocks. 
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3 Two Building-Block Model Architectures 

3.1 Structure of Building Blocks 

In both architectures, a policy or policy rule consists of building blocks and 
connections between them. A building block is a rule or a set of rules. The structure 
of a building block is roughly similar to that of a policy in the policy information 
model [Moo 00], and the structure of a rule is also similar to that of a policy rule. A 
building block is executed as follows. A rule is selected if the input packet matches 
the condition specified in the rule. Then the action specified in the rule is executed 
and an output packet is generated. If no condition in the rule set matches the input 
packet, no action is taken and no packet is outputted. So a building block inputs a 
stream of packets, or a flow, filters it, and maybe splits it into multiple flows or 
merges multiple flows into one. 

A network node can be modeled as a building block or a collection of building 
blocks. A building block has input ports and output ports. Building blocks are 
connected by eonnecting each input port and output port. So the function of the 
network node ean be represented by a DAG in which the vertices represent the 
building blocks. The whole network can also be modeled by a building-block model. 
Eaeh function in the network domain can be represented by a DAG. The task of a 
policy server is to decompose this DAG into subgraphs and to deploy each subgraph 
to each router in the domain. The edges between the subgraphs are mapped to the 
lines between the routers. 



3.2 Pipe-Connection Architecture 

The pipe-eonneetion architeeture is explained in Fig. 2. In this arehiteeture, each 
building block has a fixed number of input ports and output ports. Each input or 
output port has a port identifier. Port identifiers can be numbers or alphanumeric 
identifiers, but they are assumed to be ordinal numbers here. The example used in 
Fig. 2 is a Diffserv router configuration, which will be explained more in the next 
section. 




Fig. 2. A model using the pipe-connection architecture 

Building blocks are connected by pipes. The beginning of a pipe is connected to 
an output port of a building block. The end of the pipe is connected to an input port 
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of another building block. A packet stream flows in each pipe. When a packet flows 
into a building block, one (or zero) packet flows out from the building block. Pipes 
are uniquely identified by their tags. Packets come into a building block through an 
input port and go out through an output port. Usually, a packet is outputted to only 
one of the output ports, i.e., packets are not duplicated implicitly, and two packets that 
come from the same or different input ports are never merged into one packet. 

In Fig. 2, six building blocks are given: Classification, Metering, Markingl, 
Discarding, Marking2, and Scheduling. The Classification and Metering building 
blocks contain two sub-blocks, or rules. Other building blocks can contain only one 
sub-block. The Classification and Metering building blocks are connected by a pipe 
named Cl, and Cl connects the output port 1 to the input port 1. The Classification 
building block has two output ports. Each packet that flows into this building block 
flows out from only one of these output ports. A Discarding building block has no 
output ports. So packets that flow into a discarding building block never flow out. 
The Scheduling building block has two input ports, but other building blocks have 
one input port in Fig. 2. 

This architecture can be properly represented by using a backward-chaining 
predicate-logic-based language similar to GFIC [Ued 85], Concurrent Prolog [Sha 86], 
or Parlog [Cla 86]. These languages are suited to describing data stream processing. 
So the pipe-connection models can be expressed directly. A language for this 
architecture, which is called SNAP (Structured Network programming by And- 
Parallel language), was defined by Kanada [Kan 00b]. In SNAP, each building block 
is represented by a predicate, and a predicate consists of clauses (i.e., rules). Building 
blocks are connected by logical variables. So a logical variable is used as a pipe. The 
model in Fig. 2 can be expressed in SNAP as follows: 

ef_ingress(Si, So) :— // Building block ef_ingress inputs stream Si and outputs stream So. 

or(filter[Source_IP = 192.168. l.*](Si, Cl) | 

// Packets (in Si) whose source IP subnet is 192.168.1.* are outputted to Cl. 
or(meter[Average_rate_max = 1Mbps] (Cl, PI) | 

// Packets (in Cl) within the bandwidth limit are outputted to PI. 
mark[DSCP = 46](Pl,Ml) 

// Packets in P 1 are marked and outputted to M 1 . 
; otherwise(Cl, P2) | // Packets (in Cl) that do not meet other (only one here) 
// conditions in the case stmcture are outputted to P2. 
discard(P2) // All the packets in P2 are discarded. 

) 

; otherwise(Si, C2) | // Packets (in Si) that do not meet other conditions 

// in the case structure are outputted to C2. 
mark[DSCP = 0](C2, M2) // Packets in C2 are marked and outputted to M2. 

), 

schedule [Algorithm = priority](Ml, M2, So). 

// Streams Ml and M2 are merged into So. A queue is assigned to Ml, 

// and another queue is assigned to M2. They are scheduled by a priority 
// scheduler. The priority of Ml, which is the first argument, is higher. 

This program defines a building block calld ef_ingress, which has one input port 
and one output port. This example will not be explained further here, but an 
explanation of a very similar program can be found in Kanada [Kan 00b]. 




200 



Yasusi Kanada 



3.3 Label-Connection Architecture 



The label-connection architecture is explained in Fig. 3. In this architecture, each 
building block has only one input port and one output port. Building blocks contain 
one or more rules. For example, both the Classification and Metering building blocks 
have two rules. The execution order of building blocks is constrained. The order 
constraints are represented by a directed graph. In Fig. 3, six building blocks are 
connected by directed edges. For example, the Classification building block is 
connected to the Metering and Marking2 building blocks. So the Metering and 
Marking2 building blocks will be executed just after executing the Classification 
building block. Whether the Metering or Marking2 building block is executed 
depends on the conditions of the rules in the Metering and Marking2 building blocks 
and the value of the packet. If the packet matches a condition in the Metering 
building block, this building block is executed. 




Metering 
if Label == C1 && 
Average_rate 
<= 1Mbps then 
Label = PI 



if Label == Cl && 
Average_rate 
> 1 Mbps then 
Label = P2 



Markingl 



if Label == P1 then 
Label = dscp(46) 
Priority = high 







Discarding 

if Label == P2 then 
Discard 



Marking2 



if Label == C2 then 
Label = dscp(O) 
Priorify = low 







Scheduling 

Algorithm=priority 



Fig. 3. A label-connection model 

Each rule attaches a tag called a label, which contains an integral value, to each 
packet in a flow or to a packet flow. There are two types of label. One type is a real 
label that may be inside the packet, for example, as a DSCP or an MPLS label. The 
other type is a virtual label or VFL (virtual flow label, named „Label“ in Fig. 3) 
(Fig. 4). The value of the VFL is not put on the packet and the VFL can be regarded 
as a tag put outside of the packets (Fig. 4). Only one VFL can be attached to a flow 
or packet. In Fig. 3, the rules in the Classification and Metering building blocks 
assign a VFL, and the Markingl and Marking2 building blocks assign a DSCP as the 
label. The initial value of a VFL is „undefmed“ (a specific value). 

A flow or a packet may have two or more tags. In Fig. 3, Markingl and Marking2 
building blocks assign a value to a tag named „Priority“. Tags except the label are 
called attributes. The priority attribute is used for the priority scheduling in 
Scheduling building block. 

The order of execution can be uniquely defined by defining and referring to VFL 
values appropriately. For example, the first rule in the Classification building block 
assigns value Cl to the VFL, and the second rule assigns value C2. The rules in the 
Metering building block assumes the VFL value is Cl. So these rules can be 
executed only after executing the first rule in the Classification building block. The 
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only rule in the Marking2 building block assumes the VFL value is C2. So this rule 
can be executed only after executing the second rule in the Classification building 
block. 




P acket P acket 

(a) DSCP (a real label) (b) A VFL 



Fig. 4. DSCP and Virtual flow label 

Label-connection architecture can be properly represented by using a language for 
production systems similar to OPS5 or other forward-chaining rule-based languages 
for developing expert systems. In such a language, each rule is an if-then rule; i.e., 
each rule has a condition and actions. This rule structure is very similar to that of a 
policy rule in the policy information models [Moo 00][Sni 00]. However, 
conventional languages for production systems have no method for structuring rules 
(i.e., building blocks) as sets, and for giving a partial order to them. So we should 
define a new language to represent this architecture. The model in Fig. 3 can be 
expressed by using such a language as follows: 

MODULE ef_ingress IS 

RULE SET Classification, Metering, Markingl, Discarding, Marking2, 
Scheduling; 

RULE SET ORDER 

Classification -> Metering, Marking2; 

Metering -> Markingl, Discarding; 

Markingl, Discarding, Marking2 -> Scheduling; 

RULE SET Classification IS 

IF Source ip == 192.168.1.* THEN 
Label = C 1 ; 

OTHERWISE 
Label = C2; 



RULE SET Scheduling IS 

IF true THEN // This scheduler is always used. 

Algorithm = priority; 

END ef ingress; 

This program defines a building block called ef_ingress. Ef_ingress contains six 
rule sets, their execution order is defined by RULE SET ORDER definition, which is 
followed by RULE SET definitions. The execution order can be regarded as a 
definition of a directed graph. Because the contents of RULE SET definitions are the 
same as shown in Fig. 3, most of them are omitted here. 
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4 Differentiated Services Using the Building-Block Models 

4.1 Brief Introduction to Diffserv 

In the IETF, many working groups (WGs) are concerned with Internet QoS. In 
particular, the Integrated Services (IntServ) WG specified guaranteed per-flow QoS 
[Wro 97][She 97], and the Differentiated Services (Diffserv) WG have been working 
on class-hased QoS assurance [Ber 99]. „Per-flow“ means that each flow of packets 
between a source end-point and a destination end-point is treated individually. 
„Class-based“ means that flows are classified into service classes, and flows in the 
same service class are treated in the same way. Per-flow control enables more 
accurate QoS control, but requires much more network node resources. Because the 
resources are limited, the class-based approach, i.e., DiffServ, seems more practical 
for the Internet. 

A network domain, which will usually be an autonomous system (AS), can be 
modeled as shown in Fig. 5. In DiffServ, the domain is a part of networks in which 
the same set of PHBs (per-hop behaviors) [Bla 98] is used. A DSCP (differentiated 
services code point) [Nic 98] is assigned to each PHB in this domain. The network 
consists of network nodes, such as routers, and the lines between them. 




Fig. 5. A Diffserv network 

The network interfaces of the routers that are connected to computers are called 
edge interfaces, and the interfaces that are connected between routers are called core 
interfaces. Edge interfaces that are connected to packet sources are called ingress 
interfaces, and those connected to packet destinations are called egress interfaces. 
Because communication lines are usually bidirectional, edge interfaces usually work 
as both ingress and egress interfaces. An interface may also be used as both an edge 
and a core interface. 

In a Diffserv network, IP (Internet protocol) packets are classified at ingress 
interfaces and are marked in their DS (differentiated services) field [Nic 98]. The 
value in the DS field is called the DSCP and indicates the service class that the packet 
belongs to. At core interfaces, the QoS conditions of the packets are controlled 
according to the DSCP. 
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IP packets are classified by using a classifier. A classifier uses a set of filtering 
conditions, and each condition corresponds to an action. A combination of a 
condition and a corresponding action can be regarded as an if-then rule. This rule 
works for each packet. The behavior of an interface can be specified by using a set of 
if-then rules. Classifiers used at ingress interfaces are called MF (multifield) 
classifiers. An MF classifier mainly checks the following five items: source and 
destination IP address, IP protocol, source and destination IP ports. The action taken 
as the result of MF classification is usually marking, which means assigning a DSCP 
to the DS field of the packets. Classifiers are also used at core and egress interfaces 
in DiffServ, but in a different way. They are called BA (basic aggregate) classifiers. 
A BA classifier checks the DSCP. A resulting action may be to assign a priority to 
the queue used for the packets. 



4.2 Building Blocks for DiffServ 

Six types of primitive building blocks for DiffServ are defined: filtering, metering, 
marking, discarding, scheduling, and merging rules. A previous version of these 
building blocks was described by Kanada [Kan 00b]. The building blocks defined 
here are similar to objects defined in Diffserv MIB [Bak 00], Diffserv PIB [Fin 00], or 
QoS Information Model [Sni 00]. However, the models described in this subsection 
is different from Diffserv MIB and PIB because it is rule-based, and the building 
blocks in the models are more fine-grained than those in the information model. Most 
of these building blocks can be used as is, can be enhanced for services other than 
DiflServ, or can be used with other types of building blocks. 

Filtering, marking, and discarding rule sets are applied to a packet stream only 
once because repetitive application of these rules is unnecessary. Metering, merging, 
and scheduling rules can be applied to a packet stream two or more times because 
repetitive applications of these rules are sometimes necessary. 

Rules are described using the following syntax: ruleTypeName[parameters]. 

Filtering rules represent a part of an MF or BA classifier. A filtering rule tests the 
IP packet header of each packet. This means it tests one or all of the DSCP, source 
and destination IP addresses, IP protocol, source and destination IP ports, and so on. 
These values are specified as parameters in the filtering rule. If a packet meets the 
condition in the rule, it is outputted to the stream. Otherwise it is dropped. Examples 
of filtering rules are 

filter[Source_IP = 192.168.1.*]. // A part of an MF classifier. 

filter[DSCP = 46] . //A part of a BA classifier. 

In a pipe-connection model, filtering rules have only one input port and one output 
port, and the names of pipes that are connected to the input and output ports of the 
rules must be specified. If the input pipe name is Si and the output pipe name is So, 
the rule can be described as 

filter[Source_IP = 192.168. l.*](Si, So). 

Metering rules only pass the traffic that is conformant to the profile contracted by 
an SLS (service-level agreement). Metering rules can be implemented by using a 




204 Yasusi Kanada 



token-bucket meter or some other type of meter. The average maximum information 
rate and the bucket size can be specified as parameters. An example of a metering 
rule is 

meter[Average_rate_max = 1Mbps]. 

In a pipe-connection model, metering rules have only one input port and one output 
port. 

Marking rules write a DSCP into the DS field of the packets in the input stream. 
All the packets are outputted to the output stream. The only parameter for marking 
rules is the DSCP. An example is 

mark[DSCP = 46]. 

In a pipe-cormection model, marking rules have only one input port and one output 
port. 

Discarding rules discard all the packets in a stream. There are two types of 
discarding rules in a lahel-connection model: an absolute discarding rule and random 
discarding rules. The absolute discarding rule discards all the packets. Random 
discarding rules discard packets by using a weighted random-early-discard (WRED) 
algorithm. The function of random discarding rules is included in scheduling rules in 
a pipe-connection model, so they do not exist. There are no parameters to be 
specified for the absolute discarding rule, 

absoluteDiscard. 

In a pipe-connection model, the absolute discard rule has only one input port and no 
output port. An example of random discarding rule is 

randomDiscard[QMin = lOkB, QMax = 20kB, PMax = 0.1]. 

Scheduling rules are used for merging streams through scheduling. The parameters 
of a scheduling rule specify the method and parameters for enqueuing and dequeuing 
control. The scheduling algorithm and its parameters can be specified in scheduling 
rules, which can also be used for shaping control. The maximum and minimum 
output rate (or both) can be specified. Examples are 

schedule. // Input packets are queued until they can go out. The queue size is default, 
schedule [Algorithm = priority]. 

// Input packets are scheduled using a priority scheduling algorithm. 

// Input packets should have a priority attribute (See Fig. 3). 

In a pipe-connection model, merging packet flows must be explicitly specified. 
Merging rules are used to merge, without buffering, two or more flows.* 

merge(Sil, Si2, Si3, So). // Flows through pipes Sil, Si2, and Si3 are merged into So. 

A merging rule is not specific to Diffserv, and is usually required for the pipe- 
connection architecture. On the contrary, streams can be implicitly merged and no 
merging rules are necessary in a label-connection model. No scheduling functions are 
used for the flows inputted to a merging rule, because merging without buffers is 



’ A merging rule is necessary for this architecture because of the single-assignment constraint. 
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required here. If buffering is required, scheduling rules must be used for the merging 
flows. 

In a label-connection model, the execution order must be specified. The order for 
Diffserv is described in Fig. 6. Metering rules can be repeated because a flow can be 
policed with two or more conditions, and two or more out-profile traffic streams can 
be handled differently. Scheduling rules can be repeated because a hierarchical 
scheduler or shaper is sometimes required. 




Fig. 6. Execution order of building blocks in the label-connection model 



4.3 Expedited Forwarding Service Configurations 

Expedited forwarding (EF) service [Jac 99] is a virtual-leased-line service. Each 
microflow is policed and aggregated into a flow at the ingress edge interfaces. 
Packets with this DSCP are forwarded in high priority in core interfaces. 

An example of an SLS for an EF service is as follows: 

IF the flow is from user A (the Source IP address is 192.168.1.*) THEN 
IF the information rate is within 1Mbps THEN 
Treat the flow as an EF traffic; 

OTHERWISE 

Drop the packets; 

OHERWISE 

Treat the flow as a best effort traffic; 

The configuration for this service can be represented by the pipe-connection model in 
Fig. 2 and by the label-connection model in Fig. 3. The copies of the classifier, meter, 
markers, and discarder are deployed to the ingress edge interface, and the copies of 
the scheduler are deployed to each core interface. In the pipe-connection model. 
Merging and (BA) Classification building blocks must be added to split the program 
into an edge and a core interface as shown in Fig. 7, because a tagged pipe cannot be 
used between these interfaces. No more building blocks are needed in the label- 
connection model. 









206 Yasusi Kanada 




Edge interface Core interface 



Fig. 7. Ingress edge and core interface configurations for an EF service in the pipe-connection 
model 



4.4 Assured Forwarding Service Configurations 

Assured forwarding (AF) services [Hei 99] can be used for a wide range of services. 
An example is an Olympic service. There are gold, silver, and bronze classes of 
services in an Olympic service. The gold class gets the highest priority, the silver the 
second, and the bronze the third. The bronze service gets even higher priority than 
the best effort service. In each AF class, there may be three different subclasses of 
traffic that share the same queue in each network node but have different queue 
depths or different parameters for WRED. There are three subclasses for each class, 
AFnl to AFn3, assigned for AF services in RFC2597 [Hei 99]. 

An example of core interface configuration for an AF service is shown in Fig. 8. 
Only the core configuration for AFl is shown here. The input stream is forked into 
four, AFll, AFl 2, AFl 3, and other streams, by a BA classifier. Streams AFll to 
AFl 3 are merged using a scheduler with only one queue but three different discarding 
thresholds. Packets in stream AFll, i.e., the first argument for „schedule“, are 
discarded only when the queue is full, i.e., when the queue is filled with 100 kB of 
data. Packets in stream AF12, i.e., the second argument, are discarded when the 
queue is filled with 80 kB of data. Packets in stream AF13; i.e., the third argument, 
are discarded when the queue is filled with 60 kB of data. Packets in AF streams and 
those in other (best effort) streams are merged by using a weighted round robin 
(WRR) scheduler. 








Two Rule-Based Building-Block Architectures for Policy-Based Network Control 207 



Classification 



C> 



1 AFll 1 



DSCP==12? -6 I ) 6 



2 AF12 2 



|— DSCP==14? -(J) £]]] 6 
3 AF13 3 



or 



— DSCP==16? 






' — otherwise ? 



4 BE 4 



Schedulingl 




Schedulings I 


DropAlgorithm= 


1 1 


% 

II 

£ 

o 

bX 

< 


WRFD 00 


Qmax=100kB 






Qmax=80kB 






Qmax=60kB 






] 


2 2 




Scheduling2 S 


o 





(a) A pipe-connection model 



Classification 


if DSCP==12then 
Label = D1 






if DSCP==14then 
Label = D2 






if DSCP==16then 
Label = D3 






otherwise 
Label = S2 





RandomDiscarder 




if Label == D1 then 
Label = S1 
DropAlg=WRED 
Qmax = lOOkB 




if Label == D2 then 
Label = S1 
DropAlg=WRED 
Qmax - 80kB 


if Label -- D3 then 
Label = S1 
DropAlg=WRED 
Qmax - 60kB 



Schedulingl 



if Label == S1 then 
Label = S2 
Enqueue 



Scheduling2 



if Label -- S2 then 
Enqueue 



Schedulings 



if Label == S2 then 
Algorithm = WRR 



(b) A label-connection model 
Fig. 8. Core interface configuration for an AF service 



Schedulingl and Scheduling2, which correspond to queues, enter packets into 
queues, and SchedulingS, which corresponds to a packet scheduler, pulls off packets 
from Schedulingl and Scheduling2 according to the scheduling algorithm. 
Schedulingl and Scheduling2 in the label-connection model contain no other actions 
than replacing labels. However, they are described here because they represent 
necessary queues. 



5 Comparisons 

Five major differences between the two building block architectures are explained 
below. 

1. Rule structures: A rule in the label-connection architecture consists of if-then 

rules, hut a rule in the pipe-connection architecture does not necessarily consist of 
if-then rules. If-then rules can be simulated by using the pipe-connection 
architecture, but the syntax of a rule is different: it consists of building blocks 
instead of a condition and an action. A building block in SNAP consists of a guard 
and a body, and a guard is similar to a condition and a body is similar to an action. 
However, their semantics are different. Thus, if the pipe-connection architecture is 
to be used, a method that makes policy development easier must be developed. 
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2. Control-flow specification'. Explicit control flow is not necessarily specified in the 
pipe-connection architecture because the control flow is derived from the dataflow 
that is specified by pipes. However, the label-connection architecture needs a 
loose control-flow specification because the control flow is not uniquely specified 
by tags on packets. Thus, the execution order of policies must be explicitly 
specified in the label-connection architecture. 

3. Multiple I/O ports and modularity. In the pipe-connection architecture, building 
blocks with multiple input or multiple output ports are required, but in the label- 
connection architecture, only an input and an output ports are required. In the 
former, if there are two or more inputs/outputs with different roles, they must be 
distinguished by ports. Thus, multiple ports are necessary. On the contrary, in the 
latter, different roles are not distinguished by ports but are distinguished by flow 
labels. aThis difference in roles causes the difference in the modularity of 
schedulers or mergers. A scheduler (for Diffserv) usually input packets from two 
or more rules. In a pipe-connection model, each output port of the rules must be 
connected to an input port of the scheduler by a pipe. Thus, the number of input 
ports must be incremented when a rule is added. However, in a label-connection 
model, the number of input ports is always one. Thus, there is no need to modify 
the scheduling rule. A merger is only used in pipe-connection models. The 
number of input ports must be incremented when a rule is added too. On the 
contrary, flows are merged implicitly in a label-connection model. 

4. Tag usage'. There are two differences in tag usage in the two architectures. One 
difference is that each pipe must have a unique tag in the pipe-connection 
architecture, but the same labels and tags can be used multiple times in the label- 
connection architecture.^ Thus, DSCPs can be used as flow labels. After marking, 
DSCPs are usually not changed within a Diffserv domain. So they are not unique 
in a set of policies. For the same reason, MPLS EXPs or labels may also be used 
as flow labels. The other difference in the usage of tags is that multiple tags can be 
attached to a flow or packet in the label-connection architecture, but only one tag 
can be attached to a pipe in the pipe-connection architecture. If different 
parameters are applied to different flows, the flows must be inputted to different 
input ports. This condition causes the difference in schedulers in the Diffserv 
examples. Parameters for schedulers, such as a WRED parameter (e.g., QMax = 
100 kB) or scheduling priority (e.g., Priority = high in Fig. 3) must be defined in 
the scheduling rule (not in the queuing rule) in the pipe-connection architecture. 
On the contrary, in the label-connection architecture, such parameters can be given 
as different tags. So a set of WRED parameters can be specified by a random 
discarder (See Fig. 8) instead of embedding them into a scheduler. Thus, the 
building blocks can be smaller and the design of building blocks is more flexible in 
the label-connection architecture. 

5. Parallelism'. Constraints on parallelism should be as few as possible in network 
environments. A pipe-connection model can be executed in parallel unless the 



^ If flow labels are unique, all the rules can be put in a rule set and there is no need to specify 
the control flow. Then, the label-connection model is very similar to a pipe-eonnection 
model. 
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order of packets must be preserved before scheduling, but a label-connection 
model is a sequential execution model and paralellization has two difficulties. One 
is that the order of label and attribute assignments must not be reordered because 
reordering them may change the semantics. The other difficulty is that the 
execution order of rule sets is specified in a label-connection model, and this 
specification constrains the parallel execution. (This constraint is caused by the 
second difference.) SNAP, the language for the pipe-connection architecture, is 
based on parallel logic programming languages for describing parallel processing 
programs. 

Differences 1, 3, 4 indicate that the label-connection architecture is superior. It is 
easier and advantageous to move from a conventional policy-hased architecture to the 
label-connection architecture to make the policy-based system more general-purpose. 
However, difference 5 indicates that the pipe-connection architecture is superior. 
Although it is not very easy to move from the conventional architecture to the pipe- 
connection architecture, the author believes there is reasons to do so; parallelism is 
necessary and the parallel execution semantics must be clear. 



6 Conclusion 

Two rule-based building-block architectures for modeling a set of policies — the 
pipe-connection architecture and the label-connection architecture — have been 
developed and it was found that the label-connection architecture is currently 
preferable, but the pipe-connection architecture is better in regards to parallelism, 
which is very important. Thus, the label-connection architecture is the solution that 
can be used right now, but the pipe-connection architecture will become more useful. 
That is, if the disadvantages can be eliminated by further study, the pipe-connection 
architecture may become the right solution. 
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Abstract. Currently two main approaches exist to realize Quality of 
Services in the Internet. While Integrated Services are based end to end 
sessions. Differentiated Serviees focus on traffic aggregates. Additionally, 
architectures on higher levels like Bandwidth Brokers are developed. All 
these concepts have their pros and cons and a seamless interaction is 
desirable but complex because of the heterogeneousness and dynamics 
of the Internet. To cope with these problems an approach based on Ac- 
tive Networks is proposed. This paper presents an approach of using 
Active Networking technology to combine the two main resource reser- 
vation techniques Integrated and Differentiated Services and describes a 
prototype implementation for evaluation purposes. 



1 Introduction 

There is an ongoing discussion about realising Quality of Services in the Internet. 
One approach to achieve this was the development of Integrated Services based 
on the Resource Reservation Setup Protocol. This protocol is based on the idea 
of resource reservation for single TCP or UDP flow, causing every RSVP capable 
router to store information about this flow, allocating resources, initiating traffic 
control components or queueing systems. Even if this works fine in small and 
medium sized networks, it cannot scale in Internet backbones. On the other hand, 
RSVP is really able to guarantee bandwidth and delay on a per flow basis, fitting 
the needs of modern real time applications. 

The alternative concept for a Quality of Service supporting Internet are so 
called Differentiated Services [Nic98], [BBC+98]. The basic idea is the imple- 
mentation of different traffic classes in the Internet. The differentiation among 
these classes is done by the Differentiated Service Code Point (DSCP) in the ToS 
byte of IP packets. According to the DSCP a packet will be put to queues with 
different priority or dropping algorithms (e.g. see [HBWW99]) causing different 
packet forwarding. Every DS capable host or network may apply - according 
to his Service Level Agreement (SLA)~ certain types of service to the packet 
leaving his domain. It is obvious that the performance of Differentiated Services 
depends crucially on a good network provisioning, that can be provided in the 
backbone. 
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Beneath these two concepts Internet Service Providers (ISPs) may use own 
methods for traffic engineering. To cope with the dynamics arising of the nec- 
essary reservation mappings the benefits of distributed systems are obvious. In 
the following section we will present concepts for the mapping and interaction 
of different reservation techniques, while we focus on a flexible approach for the 
evaluation of such or similar mobile agent systems in large networks. 



2 Active Networking for QoS Support in the Internet 

The use of mobile agents seems a promising approach for the management of 
reservation mapping in large networks. In this section we will present the prob- 
lems arising by the use of different reservation methods and propose an agent 
based architecture to cope with these. 

2.1 Reservation Domains and ISP Service Mapping 

Obviously the points of interest for mapping of resource reservations are the bor- 
ders of regions supporting or favouring different reservation protocols. This is not 
necessarily the entire network of an ISP. Even within an ISP it might make sense 
to provide several areas with different supported reservation methods. Because of 
this we will not refer to an ISP’s network but to a so called Reservation Domain 
(RD), representing a certain topology regarding reservation protocols or admin- 
istration. Unfortunately, there are some general problems with the mapping of 
resource reservations to other types or reservation concepts. 

— The borders of the reservation domains may change. Most probably they will 
be located at the border routers of an ISP, but of course no ISP is forced 
to provide an homogeneous network working with one type of service all 
over his topology. So an ISP may include several reservation domains (RD) 
each providing a different resource reservation method. He may offer his 
customers RSVP in the access networks, but preferring to use Differentiated 
Service in the backbone because of scalability or allowing only certain users 
to use specific reservation methods. 

— The path of the packets may depend on the desired Quality of Service. 
But also this routing may not be stable, but requiring a dynamic allocation 
of paths at the transition points between two RDs depending on current 
load and available bandwidth. Also the costs of data transport may play an 
important role. A provider may also establish such transition points within 
his network to achieve better load sharing. 

— Because of the desired traffic aggregation it is desired to aggregate multiple 
flow to one large reservation. These aggregate reservations should have a 
lifetime as long as possible to minimize management overhead. 

— Information about the amount and type of requested resources might get lost 
during mapping, because of incompatibilities in traffic specifications. So as 
few mappings as possible should be applied to a flow or aggregate during it’s 
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transport over several RDs. RSVP uses for example a detailed description 
of a single reservation, while Differentiated Services only acts on aggregates. 
So information about a specific flow may get lost, when mapping RSVP to 
Differentiated Services. 

Figure 1 shows two ISPs with their interior routers (black) and the border 
routers (white). Typically the used reservation method changes at the border 
router of an ISP, so a reservation for a flow forwarded from ISP A to ISP B 
might have to be mapped at the border routers (white). As already mentioned 
one problem at this point is, that there are no really established standards for 
the description of a reservation, being compatible between the different concepts 
of resource reservation. RSVP uses for example a quite detailed flow specification 
flowspec [Wro97] 



flow spec = {r, 6, p, m, M} 

with the token bucket Rate r, the token bucket size b, the peak data rate p, 
the minimum policed unit m and the maximum packet size M. 




Fig. 1. Two ISPs as RDs with their border routers (white) 



Unfortunately the Differentiated Services framework does not describe ser- 
vices as detailed as RSVP. Services are defined by a so called per hop behaviour 
(PHB). A PHB describes how a packet has to be treated during forwarding, 
e.g. which kinds of queueing strategies have to be applied. The actual proposed 
standards include mainly PHBs for a minimum delay guaranteed bandwidth for 
the realisation of a leased line like services called Expedited Forwarding (EF) 
([JNP99] and several classes for flows with multiple drop precedences called As- 
sured Forwarding (AF) [HBWW99]. Furthermore, these proposals do not have to 
be realised by each ISP, even the DSCP values for the same service may change 
from ISP to ISP. So at the border of an Reservation Domain (RD) several map- 
pings may be necessary. Obviously the RSVP to DiffServ mapping is one of the 
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most popular cases [BBBGOO]. Because of scalability issues an ISP will be in- 
terested in mapping RSVP reservation to Differentiated Services decreasing the 
load of his backbone routers. Because the realisation of Differentiated Services 
may differ between two ISPs also a mapping of the DSCPs might be necessary. 

2.2 Mobile Agents and Service Mapping 

The translation of a resource reservation type at the border of a Reservation 
Domain (RD) faces some problems. 

— A RD may depend on the type of an incoming reservation. So a RSVP to 
DS conversion might be desirable at an ISP’s border router to prevent his 
backbone routers of the RSVP processing load, while a mapping to Differ- 
entiated Services is necessary when entering the ISP’s backbone using some 
specific reservation scheme (see also [TH98]). 

— Certain reservation mapping methods require a specific setup of a router pair. 
So, a cooperation of two routers is necessary to setup a tunnel through an 
ISPs network (e.g. aggregated RSVP [GBH97]). Especially in large networks 
it might not be reasonable to manage these setups centrally, but leave it to 
the endpoints to cooperate. 

— A new mapping may require a configuration of intermediate routers as a 
setup of certain queueing disciplines. A central approach would have to de- 
termine the path causing a lot of management overhead, while an ingress 
router may simply launch a capsule to the egress point configuring the de- 
vices on it’s way through the network, reducing management overhead. 

— The mapping components have to access policing information for specific 
flows. It would be desirable to store the information as close as possible to 
the agent. 

These reasons make the benefits of distributed solutions such as provided by 
Active Networking obvious. In the following we will describe an architecture for 
the mapping of resource reservation over several Reservation Domains. 

One of the big advantages mobile code can provide is it’s self organizing 
capability. This feature allows to create programs being transmitted through 
the network, settling at places of interest and performing specific actions there. 

Once injected into the network capsules establish agents by searching the net- 
work for significant differences in reservation handling between two neighbouring 
nodes or administrative regions, respectively. 

Once an agent is established it listens to traffic being transmitted to his RD, 
to be able to react on incoming reservation requests, (see figure 2). The network 
inside such a domain can be assumed to be somewhat homogeneous concerning 
the available resource reservation methods. 



DiffServ to DiffServ Mapping After the injection of capsules, agents oc- 
cupy the entry points of the homogeneous reservation domain. Each router is 
configured to announce every DS packet to the agent. The agent will determine 
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Fig. 2. Reservation Domains with Agents located at the RD borders 



an appropriate mapping for the packet, reconfigures the local router to do the 
proper mapping and forwards a capsule along the packet’s path through the 
reservation domain in order to configure appropriate scheduling mechanisms in 
each router (see [BBOO]). At the egress point of the RD another agent will check, 
whether another DSCP translation is necessary to meet the needs of the next 
RD (see section 2.3). There are several methods the agent can decide how to 
map DSCPs. 

encoding: The proper mapping scheme is encoded in the agent itself, requiring 
an update of agents, when DSCPs are changed, 
policy server: The agent might ask a central instance (e.g. a policy server) 
to determine a proper mapping. The agents then have to cache information 
about DSCP mappings to minimize the commnnication overhead, 
negotiation: The probably most favourable way would be a negotiation be- 
tween neighboured agents about the most appropriate mapping scheme. To 
achieve this the agents need some knowledge about the PHBs behind the 
DSCPs and methods to determine corresponding PHBs. 

Once a mapping and the schedulers are installed, each succeeding packet 
with the same DSCP can be handled properly without additional overload. The 
setup of the DSCP mapping and the queueing may have some lifetime, so after 
a certain idle period the queueing system and translation units may be removed 
automatically. Whether it makes sense to use mobile code for the realisation of 
special queuing components has to be evaluated. If only preconfigured service 
classes will be used the agents only have to reconfigure the DSCP mapping. 
As it will be described in the next subsection it might also make sense to setup 
tunnels between the ingress and egress points of an RD. Even when the mapping 
of certain DSCPs between ingress and egress points does not have advantages 
regarding scalability, it simplifies the setup of queuing components significantly. 

RSVP to DiffServ Mapping A more complex task than the support of 
different DSCPs is the RSVP-based QoS support. RSVP is based on an end to 
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end scheme, so the RSVP messages used for the setup of an reservation have to 
be transported through the RD. Figure 3 shows the messages exchanged during 
a RSVP setup. The two main steps performed by RSVP are: 

a) transmission of the path-message to notify each router on the path from 
sender to receiver. This message is sent using the Router Alert Mecha- 
nism [Kat97]). 

b) transmission of resv-message from router to router on the former detected 
path. At this step the appropriate resources are reserved in every router. 

The basic idea is now to setup reservations between ingress and egress points. 
Several RSVP reservations should be aggregated to a bigger reservation. This 
traffic aggregate is transported through an IP over IP tunnel to the according 
egress point (see figure 4) prohibiting interaction of RSVP messages with devices 
in the RD and simplifying the resource management crucially. Inside the RD, ser- 
vice allocation only has to be done for flows between tunnel endpoints [GBH97] . 
All packets transported from (B) to (E) are encapsulated, having the same 
source- and destination address, which simplifies the setup of appropriate re- 
sources within the RD. 

The setup of the tunnel is proposed to be done by mobile code. When a 
tunnel has to be setup or resized, the ingress point sends a capsule to the egress 
point allocating resources for this tunnel within each intermediate router and also 
initiates the decapsulation at the RD’s egress router. A reliable transmission of 
the capsule to the destination can be achieved by forwarding the capsule with low 
dropping precedence and the implementation of some kind of simple transport 
protocol either by the capsule itself or the capsule interpreter. 

Acting only on traffic aggregates and using some heuristics, changes to the 
tunnels will occur quite infrequently, minimizing the load for processing reconhg- 
uration capsules (see [DGBOO]). An RSVP reservation has only to be processed 
at the ingress and the egress point of the RD decreasing the delay to setup the 
reservation. 
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Fig. 4. Tunnels setup by Agents at Ingress points using capsules to configure 
QoS tunnels through an RD 



As can be seen in figure 3 there are two points at the RSVP message handling 
process we can use for the RD signalling. We can use the path-message (1) 
entering the RD to establish the connection, or use the resv-message (3) to 
setup the resources. The path-message is sent earlier but contains only premature 
information about the requested resources, so a user might alter these data or 
reserve anything. Only after the resw-message has been received (3) we can be 
sure that the user has accepted the reservation. 

2.3 Agent Interaction 

As it can be seen on figure 2 ideally on both sides of an RD border agents have 
settled to perform the reservation translation according to their own and to their 
neighbours needs. Until now we neglected the use of communication of agents 
at each side of such a border. Agents can exchange information about the used 
techniques of resource reservation used in the neighbouring RDs. 

Based on this knowledge an agent can decide about the appropriate method of 
resource reservation or cooperate with other agents to setup for example quality 
of service tunnels over multiple RDs. 

For a central instance it might be hard to supervise all border routers, collect- 
ing information about the reservation methods being available in neighbouring 
domains re-negotiating and configuring border routers and tunnels to an optimal 
traffic transport. 

As it can be seen on figure 4 an agent pair within an RD can use tunnels 
to set up resources between two border routers {E,F and G,H). Each of these 
border routers needs resources for encapsulation and decapsulation packets. Of 
course this does not make sense, when all the traffic leaving the tunnel at E is 
split up into single flows, encapsulated again by E and decapsulated by G. 

An alternative is the extension of the tunnel over multiple domains. When 
an agent at B starts to set up a tunnel, it forwards the according capsule as 
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described in section 2.2. When the mobile code reaches the egress point it nego- 
tiates with an agent at F about an appropriate mapping scheme. If F considers 
an optimal endpoint for the tunnel to be located at G it might propose this 
to E. E transmits the new tunnel endpoint to B, setting up the proper encap- 
sulation methods. Even when the tunnel is now spread over multiple RDs, the 
underlying resource reservation methods are tasks of the RD’s agents. So a cap- 
sule sent from B to E sets up resources with their RD, while a capsule from E 
to G configures their intermediate routers. 

The advantages of such a concept is the very local processing of data and 
the very limited communication over an RD’s borders. Agents established by 
capsules sent to a border router only interact with their direct neighbours, caus- 
ing only very local traffic. Additionally this simplifies the supervision of the RD 
since no foreign capsules have to be executed or even transported though an RD. 
Especially when an RD is equivalent to an ISP this is important. 



3 Concept Evaluation by Virtual Routers 

3.1 Virtual Router Architecture 

A general problem in research on networking is the demand for setting up test 
beds of sufficient size and complexity to show the desired results or to prove a new 
concept. Alternatively network simulators like ns [ns],[BEE+00] or OpNet [opn] 
can be used to prototype a device or a protocol in the special simulation envi- 
ronment and to run the desired tests. So, the simulation normally precedes the 
setup of the test scenario in a laboratory. Unfortunately, problems occur when 
combining different technologies. A simulation approach for combining IP for- 
warding and shaping components with Active Networking is at least not trivial. 
To encounter these problems we developed an approach to emulate complete 
(active-)routers including the appropriate IP forwarding. This allows the combi- 
nation of real hardware and routers with large emulated topologies using several 
so called Virtual Routers running on the same host. 
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Fig. 5. The components of a VR 
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This basic idea of combining real hardware with emulated topology is shown 
on figure 5. The core mechanism, the Virtual Router (VR), is emulating a real IP 
packet forwarder. Each VR has a couple of interfaces attached. These interfaces 
can be connected to other interfaces (dashed lines) of other VRs on the same or 
on a remote host. Alternatively, a VR’s interfaces can be connected via Softlink 
devices (sol*) to the network system of the local host. 



Host D 




Such a Softlink device acts as an interface between the operating systems 
network layer and a VR. For the OS kernel/user space it looks like a normal 
Ethernet device, transporting data to user space and vice versa. The network 
layer of the host system can not detect any differences between the real network 
and the emulated topology. So, it is possible to define an emulated topology 
consisting over multiple VRs distributed to several machines. Figure 6 shows 
the resulting topology. The Softlink device is implemented as a a Linux Kernel 
module for kernels > 2.2.12. The VR itself has been developed for Linux and 
Solaris. 

Virtual Routers are used to realize the network topology to be emulated. Fig- 
ure 7 shows the principal architecture of the program. Following the primary task 
the architecture is focusing on IP routing. The VR is completely implemented 
in plain C-I--I-, making the source extensible and easy to port. 

The forwarding mechanism acts on standard routing rules, but was extended 
to allow routing decisions by source addresses, port numbers, protocol helds and 
ToS values. 

As an interface to programs running on this virtual host, the VR has to 
provide usual IP stacks with TCP, UDP and ICMP. Actually ICMP, UDP and 
a minimal Capsule Interpreter (CIP) (see section 3.3) were implemented on top 
of IP. 

The main work regarding IP processing is assigned to the interface com- 
ponents underlying the routing mechanism. Figure 7 shows two of them. Each 



220 Florian Baumgartner and Torsten Braun 




Fig. 7. The components of a VR 

interface can be connected to a Softlink device, acting as a transition point to 
the real network or to another VR-interface. For the connection to other VR 
interfaces we use UDP. 

Received data is first processed by an IP address translation unit (NAT). 
This allows to force a source to route also traffic to the source itself over the 
VR topology, by using dummy destination addresses, which are mapped to the 
address of the source in the first VR. This allows a further reduction of necessary 
computers and simplifies the setup of large topologies. After that step packets are 
delivered to the filter (PF). This unit is programmable and rules the forwarding 
of certain packets to higher layers as it will be explained in section 3.2. 

Acting as sender, data is also transported through host filter and NAT to be 
put to the queueing system before transmitted by the Softlink device or sent via 
UDP. A token bucket filter preceding the connector is used to limit the maximum 
bandwidth of the interface. 

Because of it’s flexibility the queueing system is the most complex part of 
the interface component. It consists of a pool of subcomponents like queues, fil- 
ters, shapers, schedulers. The current implementation offers the following com- 
ponents: a generic classifier, a Token Bucket Filter, a drop tail queue, a Random 
Early Detection queue (RED) [FJ93], a Weighted Eair Queueing (WEQ) sched- 
uler, a simple Round Robin (RR) scheduler and a Priority Round Robin (PRR) 
scheduler. Additional components are a RED queue with three drop precedences 
(TRIO), a special marker for differentiated services and a Priority Weighted Eair 
Queueing (PWFQ) scheduler for the implementation of Expedited [JNP99] and 
Assured Eorwarding [HBWW99]. 

The configuration of the queuing system can be completely done at runtime 
via API or command line interface (CLI). The object oriented implementation of 
the queueing system and it’s components makes it easy to add or modify single 
functionalities. 

For the configuration of the VR a command line interface and an API have 
been implemented. The API allows programs (e.g. a capsule interpreter) running 
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on the virtual router to alter interface setting, routing rules and so on. The 
command line interface is accessible via console or external^ telnet. 

3.2 Programmable Filter (PF) 

To allow the seamless processing of capsules and to control the access of running 
agents on specific flows all data transported by an interface is processed by the 
programmable filter (PF). This filter is programmable via an API and allows 
daemons running on the virtual router to gain access to flows. An application 
can determine, which flows it gets by setting up a filter. This can be done by 
submitting a set of parameters to the PF. The set of supported parameters 
contains most of the IP headers and options. Of course more than one filter 
might be setup. 



PFspec = {{Des, DesMask), {Src, SrcMask), {Opt, OpUaU Protocol, ToS)} 

This concept has the advantage, that an application only has to provide the 
filter pattern once rather than processing all incoming data. It simplifies the 
setup of applications as well as speeds up processing of normal traffic, being 
forwarded by the VR. 

There are two main filter modes: copy and move. During copy mode a packet 
is forwarded normally and a copy of it is passed to the upper layer. In the move 
mode the packet is not duplicated, but only passed to the HAL/IP layer (see 
figure 7). So, an application can process a packet and re-inject it afterwards or 
completely discard it. An application can override queueing systems and for- 
warding mechanisms of the VR. For example, a whole queuing system may be 
applied for a certain type of packets, without affecting the normal processing of 
standard IP packets. 

In addition to forward packets to applications according to the set up of 
filters, the PF can be configured to preprocess the packets. It is obvious, that 
an application may not be interested in the entire packet, but only in certain 
information. The PF provides the functionality to truncate the packet body and 
provide only the header to the application. It may collect statistical data about 
a certain packet type like the average, the minimum and maximum bandwidths, 
the number of matching packets and so on. 

It has to be mentioned, that the PF does not provide any security. The PF 
acts on the filter set up and does not control whether the application is authorized 
to access these data. If the PF is used to support a Capsule Interpreter (CIP), 
the decision which capsule is allowed to setup which filter is left to the CIP. 

^ This telnet connection is not provided by the VR, but by the host running the VR 
making it independent from any changes made to the VR. This simplifies the setup 
of multiple machines crucially, because no changes to interfaces setup or queueing 
mechanisms can harm the TCP connection used for the configuration. 
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3.3 Capsule Interpreter 

To support the exchange and execution of mobile code in a network of VRs, 
i.e to get a Virtual Active Router (VAR), a Capsule Interpreter (CIP) was im- 
plemented. These CIP uses the PF to receive IP packets with the Router Alert 
Option [Kat97] set. A packet that is forwarded through the VAR with these 
options is also copied to the CIP, executing the capsule’s code. 

For some first testing of the capsule forwarding mechanisms we implemented 
a very simple TCL based CIP. The approach behind was not so much influenced 
by architectural or security issues, but by the quick implementation of a mecha- 
nism to distribute scripts over a network of Virtual Active Routers. In general it 
was mainly used to evaluate the concepts of programmable filters and VRs. The 
interfaces to the Host Access/IP layer are simple and it should be easy to port 
a more advanced CIP to the VR in future. Especially we evaluated methods for 
the setup and the configuration of an VR’s queueing system to support specific 
flows. 

The capsule format is comparable to the Active Network Encapsulation Pro- 
tocol (ANEP) [ABG+97]. TCL scripts including a ANEP like header are in- 
cluded in IP packets using the Router Alert Option [Kat97] to be executed by 
each router they pass. The CIP itself requests these packets by an appropriate 
configuration of the PF. A capsule may set up additional filters at the PF to 
process specific flows or to act on the arrival of certain packets. The CIP has to 
control which data may be accessed by the capsule. 

4 Summary and Outlook 

The use of Active Networking technology offers great possibilities for the man- 
agement of large networks. Of course there is a lot of research to do especially in 
the areas of security and performance. The combined approach of using IP and 
AN as assumed in this paper could offer the desired performance as well as the 
enormous flexibility that only mobile code can provide. 

The other important aspect is security. In our scenario capsules are only 
launched by an ISP’s own hosts and their activities are limited to the border 
regions of their RD. Any interaction between foreign agents only takes place 
either within an ISP’s RDs or between neighbouring ISPs, so security solutions 
should be feasible by applying common security mechanisms like encryption and 
authentification via asymmetric algorithms. The VAR platform will be optimal 
for large scale performance evaluation of the proposed service mapping concepts. 

A future issue will be the information exchange between single agents to 
implement a self organizing distributed system optimizing routing and tunnel 
setup regarding QoS and costs. 
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Abstract. Future requirements for supporting distributed object appli- 
cations by active networks are discussed. A middleware approach for 
active networking is presented. A CORBA-based distributed processing 
environment (DPE) is described as an interoperable service platform in 
an active network to enable end-to-end QoS for distributed object com- 
munication. Three key platform services are presented, including QoS 
binding, component management and resource control. A binding 
framework is enhanced to achieve transparent QoS binding; an active 
component management service is proposed as an out-band signaling to 
install service objects; active node resources are adaptively managed to 
support generic reservation requirements. As a whole, the paper pres- 
ents a distributed computing model for active networks so that active 
services can be dynamically deployed as downloadable objects to apply 
different QoS architectures on demand. With this model, distributed 
object-oriented systems directly benefit from active networking tech- 
nology with respect to QoS need. 



1 Introduction 

Programmable Networks [6] promote open architectures and standard interfaces for 
flexible service provision to enable novel service architectures by Internet Service 
Vendors (ISV). Active Networks [18] allow dynamic customization and re- 
configuration of a network by means of secure code injection in a network. As a re- 
configuration example, service modules can be encapsulated in the form of code or a 
composition of codes, and dynamically installed or updated. Therefore, it greatly 
increases the flexibility of service deployment. Harmonization of the two approaches 
within a distributed object framework therefore facilitates deployment of services, 
either application-specific or generic, to better support distributed object applications 
than today, in terms of, e.g. Quality of Service (QoS). The framework can be seen as 
an interoperable (e.g. by HOP) distributed platform for running active services. This 
paper presents a technical approach to show how active networking can be used to 
improve the execution of distributed applications, with the help of a middleware 
bridge. 
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Providing end-to-end QoS for distributed object systems is the major focus of our 
approach. In existing frameworks such as CORBA, providing QoS for the communi- 
cation between distributed objects is challenging due to the difficulty of deploying a 
generic QoS architecture [7], which might require an integration of appropriate archi- 
tectures such as Integrated Service[15] and Differentiated Service [10] across the 
network to obtain an end-to-end QoS guarantee. On the other hand, in many cases the 
QoS requirements from application objects cannot be mapped or translated into net- 
work QoS parameters as supported by different protocols, e.g. RSVP [16]. Thus QoS 
solutions [1] in the middleware are normally relying on and closely coupled with par- 
ticular network-layer QoS mechanisms. Related work includes Active Reservation 
Protocol [3] which enables portable signaling software but is bound to Java and there- 
fore limited by Java security and interoperability and Xbind [12], which uses CORBA 
as the platform for programmable value-added services but is limited by existing ORB 
implementation with regard to QoS support. 

In the following the paper summarizes our design in the “Broadband Active Next 
Generation” [4] project. We developed an active middleware framework that supports 
end-to-end QoS for a wide range of distributed applications. It represents a distributed 
computing model for active networks so that active services can be dynamically de- 
ployed as downloadable objects to apply different QoS architectures on demand. The 
framework mainly refers to an active distributed processing environment (DPE) that is 
an execution environment (EE) based on an enhanced Object Request Broker (ORB). 

In general, the role of a DPE is to ease the development of distributed applications. 
An application object can access the interface of other objects without knowing the 
location of those potentially remote objects. The DPE is used to gain access to inter- 
faces, that is to set up a communication path between objects. The ODP Reference 
Model [13] defines a generic model for distributed processing and standards as 
CORBA [8] specify a concrete architecture supporting this. Since CORBA is limited 
in the types of object communication it supports, the more open Jonathan architecture 
developed during the ReTINA project [9][1 1] is used as a basis. 

Another important role of the DPE is to manage resources in a distributed way so 
that an end-to-end QoS can be achieved. This includes management of processing 
resources as well as network resources both in end-systems and network nodes. In 
traditional router architectures, network resources are managed in a best-effort and 
rigid fashion. Programmable routers open the internal router details through object- 
oriented interface, enabling delivery of novel services as software packages by third- 
parties. These new services should be highly customizable. Network resources thus 
need to be controlled in a fine grained manner and could be bound with a service 
dynamically. A generic router resource interface [5] based on the programmable inter- 
faces being proposed by IEEE PI 520 [6] makes this possible by providing generic 
abstractions and dynamic binding capability. On the other hand, in an active network 
service modules share the resources in parallel in run-time, their access need to be 
synchronized to prevent conflict. To optimally utilize the limited resources in a router, 
more intelligent allocation of resources is preferred than fixed partition and reserva- 
tion. This is particularly important for bandwidth, which nowadays becomes a com- 
modity for auction. 
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The DPE also allows dynamic deployment of components which will be down- 
loaded to an active node and run on the node's DPE. Security has to be considered on 
what a downloaded component is allowed to do as well as what resources it may con- 
sume. The dynamic deployment of components requires a special installation service 
as part of a computing model for active networks [2]. 

These form the basis for higher level reservation services. Because of the dynamic 
deployment those services are highly customizable by means of updating or exchang- 
ing components. With the definition of interfaces between instances of a service on 
different network nodes the service can provide a network wide resource control. 
Policies are used to regulate resource usage: user identities, time slots, priorities, etc. 
are used to gain efficient multiplexing of available physical resources. 

The next sections describe these parts of the active DPE. The binding framework is 
used to set up communication paths to remote objects, the resource control framework 
is used to get a generic thus fine grained access to resources, the installation service 
allows the dynamic deployment of components, and finally a distributed reservation 
service is outlined to show how to take advantage of an active DPE to provide end-to- 
end QoS for distributed object applications. 



2 Binding Framework 

Communication between objects supported by the framework is through bindings, 
which are created by object adapters. In this framework, the notion of object adapter 
is overloaded and extended to allow the explicit binding of objects: the explicit crea- 
tion of a binding between different interfaces is realized by invoking an operation on 
an object adapter. Object adapters are binding factories . 

In contrast to the CORBA architecture which identifies an ORB core responsible 
for the conveyance of operation requests and replies, the notion of object adapter in 
this framework is extended to cover also communication aspects, which may thus vary 
from object adapter to object adapter. In summary, an object adapter is not limited to 
cover the server side as in the standard CORBA specification, but actually extends to 
the client side. The notion of ORB core in CORBA can be recovered as a specific, 
default object adapter that can be combined with other object adapters. 

This flexibility allows to define a special binding factory which understands addi- 
tional parameters like QoS requirements for the creation of a binding. This binding 
can then provide an interface to application objects to allow dynamic changes of its 
behavior as well as offering registration for notifications about status changes. 

The binding framework consists of a set of abstractions for the construction of ar- 
bitrary communication stacks and abstractions for the construction of protocol- 
independent operational stubs. Communication abstractions comprise: 

• Protocols: these are abstractions of protocol machines at a given site; they manage 
the establishment and release of sessions. 

• Sessions: these are logical communication channels that obey a particular commu- 
nication protocol; sessions in different capsules exchange messages. 

• Messages: these are abstractions of data exchanged between capsules. 
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Fig. 1. QoS involved in object binding 



Protocol-independent stubs described in the binding framework provide generic 
interfaces for operational bindings. They can be specialized to derive more specific 
forms of operational bindings. Stubs are at the interface between the untyped world of 
protocols and the typed world of language bindings. 




Fig. 2. Architecture of a binding (client side) 

The open architecture of the binding framework allows the easy insertion of com- 
ponents into the communication stack (see Figure 2). For the support of QoS the 
binding factory pushes a wrapping session on top of the session stack. The wrapping 
session interacts with the local and remote resource management. Additionally a spe- 
cial controller is introduced with the purpose to control the behavior of the binding 
and allow dynamic modifications of binding properties. This functionality can also be 
provided to the application layer. 

The interaction with the local resource management comprises contacting resource 
managers for processing resources also known as schedulers, managers for memory, 
and managers for local network interfaces. The interaction with remote resource man- 
agement is achieved by negotiating QoS with intermediate network nodes and the 
target end-systems of the binding. To achieve end-to-end QoS for an object binding 
the binding has to interact with both local and remote resource management. 





Enable QoS for Distributed Object Applications by ORB-Based Active Networking 229 



3 Resource Control Framework 

The resource control framework provides a set of abstractions needed by system de- 
signers, service suppliers and application programmers to build applications requiring 
and/or providing QoS properties. These abstractions address fields of concern that 
must necessarily be considered when dealing with such QoS properties. Operating 
systems or platforms do not need to implement such abstractions but they must pro- 
pose to the programmers basic services on top of which such abstractions can be built. 

The first goal of this resource control framework is to provide basic abstractions for 
designing and engineering: 

• resource multiplexing and scheduling mechanisms; 

• QoS handling mechanisms. 

The second goal of this framework is to provide guidelines for how to build 
"smart" resources and multiplexers for applications dealing with QoS constraints. The 
abstractions are therefore used to identify resource control design patterns. 

In order to be effectively instantiated and to execute, objects must be mapped onto 
hardware resources such as memory, network, external data storage, processors etc. 
The mapping is done by resource managers. The role of a resource manager is to let a 
resource, or a set of resources, be shared between objects. A manager will provide to 
these objects an abstract view of the resources it manages, and control the way these 
resources are used. Resource managers have to keep track of what resources have 
been granted to which identities. This is important for logging and enables higher 
services like accounting. It is also crucial for ensuring that components cannot exceed 
predefined restrictions of resource usage. 

The generic way of gaining access to resources is first to check the admittance to 
resources and then reserve them. If resources are not needed any more they get unre- 
served. The semantic of the admit/reserve pattern is that resources which have been 
admitted to a particular object stay so only for a predefined period of time. If the re- 
sources are not being reserved in this period the admittance will become invalid and 
later reservations may fail. The admit/reserve pattern allows to check the availability 
of a chain of resources before issuing the reservation. This is essentially important for 
end-to-end QoS. Nevertheless concrete implementations of schedulers may - for the 
sake of simplicity - choose to put the admit and reserve operations into one operation. 

There are three major types of resource managers: schedulers manage the sharing 
of processing time, memory managers manage the sharing of memory resources, and 
node resource managers manage the sharing of local node resources. This paper fo- 
cuses on node resource managers. 

4 Node Resource Manager 

A node resource manager (NRM) is seen as an active network facility to control the 
resources in a programmable router, e.g. bandwidth, queue, buffer, etc. It is deployed 
in each network node and responsible for managing the use of local node resources. It 
is the kernel module in an active distributed processing environment to support net- 
work-wide services with respect to resource access and usage. 
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Fig. 3. Architecture of the node resource manager 

Figure 3 depicts the role of a NRM in the context of an active DPE. A NRM de- 
fines mechanisms that control the allocation of resources and synchronize the access. 
It makes use of the generic interface specified in [5], and provides a generic resource 
API to network-wide services, including high level reservation service. 

In an active network, resource allocation represents one common request from net- 
work service modules, e.g. an admission control function. It could be a request for a 
minimum bandwidth for a flow, a class of service for packets, or a forwarding priority 
for a flow with particular protocol identifier. It is the major function that a resource 
manager should provide. To more flexibly support the resource needs from different 
services, and to maximize the resource utilization, a NRM implements adaptive allo- 
cation facility. The facility dynamically adjusts the allocated resources to accommo- 
date new resource requests. 

The necessity of such a facility can be justified by a simple scenario - in an intranet 
a flow from a director has higher priority than the flow from a normal employee in an 
active network. Data flows are thus supposed to have different priorities for transmis- 
sion. This requires a more flexible/dynamic configuration of the limited network re- 
sources, to optimally fulfill different users’ resource requirements. That is, high prior- 
ity flows should have precedence against lower priority flows when resources are 
allocated. As the overall resources are limited, a NRM should be able to dynamically 
re-allocate resources to accommodate new higher-priority flows, and become adap- 
tive. 

To allocate resources, a NRM maintains a view of the available node resources, 
mainly the bandwidth, and the state of the queues that split the overall bandwidth. It 
also maintains a view of the QoS-related parameters that a router is allowed to oper- 
ate, e.g., discarding priority, queuing priority, and so on. These information together 
form a local resource map, which may be associated with allocation requests currently 
alive, to monitor the resource usage. 

An allocation process generally consists of several basic steps: look-up, partition 
and admission. A look-up operation checks the available resources from the local 
resource map; a partition operation allocates required amount of resources; and the 
admission operation notifies the service module about the success or failure. On sue- 
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cess, a soft-state is maintained for this allocation, and updated periodically so that an 
allocation can be modified later and use of the allocated resources can be monitored. 
Notably, the partition operation becomes more intelligent to realize adaptive alloca- 
tion. In the following an example depicts the principle of a adaptive resource alloca- 
tion and its result. 
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Fig. 4. Mapping of flows to queues 




Figure 4 shows the mapping of flows to queues. A local resource map maintains a 
table "queues" that tracks the state of all the queues, and a table "flows" to keep rec- 
ord of all the flows that request an amount of bandwidth. Each flow is associated with 
a queue where it obtains requested bandwidth. The rationale for adaptive allocation is 
preemption of resource occupation by flow priority, analogous to preemption of CPU 
time by thread priority, there are two key research issues to resolve in order to have a 
fair and efficient solution: 

• Selection of lower-priority flows: grabbing resources from low-priority flows and 
allocating them to higher-priority requests also means violation of previous guar- 
antee promise. Such a violation should be within the tolerance as defined in a serv- 
ice level agreement (SLA). Thus when selecting low-priority flows, a cross check- 
ing between flows’ priority, their bandwidth and associated SLAs is required, and 
appropriate algorithms should be defined to be fair to each flow and its user. 

• Re-allocation of resource: the resources allocated to a single low-priority flow 
might not be sufficient for a new flow with higher-priority, a merging of multiple 
flows’ resources is preferable. In some other cases, resource partition is needed to 
accommodate more than one high-priority flows. An efficient scheme is to be re- 
searched to avoid waste of resources and operator-defined policy should be sup- 
ported. 

By this technology, we aim to realize the goal - efficient allocation and fair usage 
of network resources in an active network. A NRM should provide a generic resource 
manager API that a wide range of QoS services are able to use. Considering the major 
QoS frameworks, Intserv and Diffserv, we define an API that supports both flow- 
based and Type of Service (ToS)-based resource allocation. 

Specifically, the flow-based API allows identification of flow, assignment of flow 
priority, required QoS. In addition, QoS tolerance and necessary notifications are also 
supported by the API. The ToS-based API allows identification packets with particu- 
lar ToS value, mapping between ToS and QoS. In this API, a ToS value represents the 
transmission priority of a packet and can be rewritten and remapped to output priority 
by the NRM to achieve dynamic resource allocation. 
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5 Active Component Manager Service (ACMS) 

The opportunity offered by active networks to dynamically install components for 
execution in network nodes offers a high degree of flexibility and several other ad- 
vantages to network management. On the other hand, such an action exposes a serious 
security issue: malicious or bad designed components could damage or cause mal- 
functions to the active nodes. In order to tackle these drawbacks, our design is based 
on the following rationale: 

• Active components are installed via a policy-controlled way from internal or exter- 
nal repositories 

• Policies are defined for the resource usage and allowed behavior of a component 
and the overall system. The security manager is consulted for all security critical 
activities. 

Active Component (AC) is a service component that executes within an Execution 
Environment (EE) in an active node. An Active Component can maintain its state 
from node to node transition, or be stateless (no state is maintained). It could be itself 
mobile (e.g. an agent) or could be transferred to the active node by other third entities. 
The ACMS allows AN entities (e.g. users, administrators etc) to install AC on the 
node and make use of it or possibly make it available to other third party entities via a 
policy controlled way. 

5.1 Architecture 

The architecture of the ACM is depicted in Figure 5. The main components are: 

• Active Component Manager: This is the front-end of the architecture. All requests 
are issued to, scheduled and executed or denied by this component. Other system 
and service components stored in ACM’s DB are loaded and instantiated by the 
ACM. 

• Security Manager: This component is responsible for all security relevant activi- 
ties. It interacts with Policy and Credential Managers in order to take security deci- 
sions and grant or deny the issued requests. Checks are made to ensure that i) only 
authorized users install and interact with node’s services and ii) the policy of re- 
source usage by the installed components is enforced. 

• Policy Manager: The policies for component/service access are maintained by this 
component. Via its interface authorized entities can dynamically modify the poli- 
cies in the EEs or those of the node. 

• Credential Manager: This component is responsible for managing the credentials 
of users/AC e.g. Certificates, public/private keys etc. 

• Audit Manager: All events are audited by this component for further exploitation. 

• Resource Manager: This module controls the allocation and access of the local 
node resources (computing resources and network resources). The access to re- 
sources is controlled in cooperation with the Security Manager. 

• ACM Repository: It is the repository that AC is stored. This could be an external 
location accessed via known protocols such as http, ftp, Idap or even another ACM 
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repository in another active node. Of course the AC could also reside somewhere in 
the local file system. 
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Fig. 5. Active Component Management Service Architecture 

Figure 5. shows the procedure of installing an active component. A possible sce- 
nario that shows the interaction between the components is as follows: 

1. Request: A request is made to the ACM to install a component/service. The re- 
quest might be issued explicitly by a user (the user is generally any authority - the 
difference is depicted via the policy scheme with the use of access rights) or im- 
plicitly as a side-effect of the setup of an object binding requiring a certain service 
on the active node. 

2. Security check: The ACM consults the Security Manager (SM) whether the speci- 
fied action is allowed or not. The SM verifies the credentials of the authority that 
issued the request in co-operation with the Credential Manager (CM). Then it 
checks with the Policy Manager (PM) what the current policy is. The Resource 
Manager (RM) is consulted whether the action is allowed or not. Finally the SM 
returns an accept or deny result for the specified action. 

3. Process of Request: The ACM executes or denies the user request. E.g. installation, 
deinstallation, instatiation, destruction, service start, service stop, AC retrieval, 
service/code search etc 

The actions following the last step vary as they depend on the nature of the request 
issued. We can have: 

• Download: if the request is valid and the components are not cached locally, the 
service contacts another repository (e.g. via http, Idap, etc) to download the re- 
quested component 
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• Resource allocation', after the component is downloaded the appropriate resources 
are allocated (i.e. a new' job is created to run the tasks of the component). 

• Instantiation', the component instantiates and executes in a policy-controlled envi- 
ronment 

• Runtime checks', all interactions of the installed component w'ith the resource man- 
agement are checked w'ith the policy management, this ensures that the component 
does not exceed its predefined amount of resource usage, nor it violates the given 
access rights. 

5.2 Implications of ACMS 

Such a service has several implications in an AN infrastructure. We w'ill try to com- 
ment here on the most obvious ones. 

Security: The security of the AN is fortified as w'e can control via policy w'ho in- 
stalls w'hat, w'here and for how' long. Furthermore w'e can control w'ho has permission 
at runtime to execute w'hich components and under w'hat environmental conditions 
(e.g. available memory). Also via the predefined node manipulation idea described 
later w'e can actually have an active node which is under the complete control of the 
node administrator and yet programmable by third parties. 

Safety: The existence of the ACMS can guarantee a higher level of safety in the 
node. Many security violations occur not only because malicious software misuses the 
node, but also from trusted code that does not execute correctly. So we need a way to 
be sure that the code that executes wall not bring the node to an unstable state by mis- 
take. To achieve this, one could use safe languages such as PLAN or Netscript, but 
usually this brings performance penalty and limits the programming flexibility. In our 
approach, run-time safety is also a task of the resource manager which monitors the 
resource usage of each component, and prevents access conflicts. ACM’s role include 
setting-up a sandbox for each service which has its limited resource space, therefore 
providing a notion of safety at the instantiation stage. Furthermore the existence of 
ACMS allows the node owner to install his own AC on the node and allow third par- 
ties to call it and execute it. As he is the author of the code, he has already tested it 
and knows that this code is safe to use (something that is not the general case for code 
coming from third parties). Furthermore AN node programming is not considered a 
trivial activity and many programmers make different tradeoffs between code func- 
tionality and code testing. It is sure that the node owner will invest more effort in 
testing and debugging AC that he installs in order to avoid future problems, than the 
average user. 

Predefined Node Manipulation: A lot of network operators are very much con- 
cerned with the idea of executing code within a node, mainly because of it obvious or 
hidden drawbacks such an action carries. For this category the ACMS can be a useful 
tool as it can provide specific interfaces to users to interact with the node. The net- 
work operator installs itself the necessary code and services in the node and allows the 
user to call this code with predefined and well tested parameters. Although again we 
have code executing we can predict the result of this execution since the node’s status 
will change to one of the predefined ones. This can be seen as a hybrid approach 
since AC is executed (active network) but actually the node is manipulated via prede- 
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fined interfaces (programmable network). This is a very attractive approach for net- 
work administrators that want to provide advanced functionality but are not willing to 
allow execution of foreign code into their nodes. 



6 Distributed Reservation Service 

This section aims to describe a scenario which demonstrates the advantages of the 
active DPE. For this purpose a generic resource reservation service is sketched as one 
active service that can be customized for different styles, e.g. reservation-in- 
advance [17] or immediate reservation (RSVP). 




ACMS = Active Component Management Service RS = Reservation Service 



Fig. 6. Distributed reservation service for object communication 

Figure 6 shows a distributed reservation service. This service consists of compo- 
nents sitting on top of the resource management on each active node. With the help of 
intra-service interfaces (2) the several components can offer network-wide interfaces 
to end-systems (1 and 3). In the end-systems these interfaces are accessed out of the 
binding framework. A special wrapping session handles the QoS needs of the applica- 
tion objects and interacts with the local resource management as well as with the dis- 
tributed reservation service. 

The components forming the reservation service are stored in a trusted repository 
managed by a network operator. In the deployment stage those components are 
downloaded and installed on the active node by the ACMS. The run-time instance of a 
reservation service component has its limited resource space, allocated by the ACMS 
and controlled by resource managers. 

To provide a network-wide interface, the service instances on different nodes have 
to interact. For this, an instance has to he able to obtain the interface references of 
other instances in neighboring active node. This can be achieved by a centralized 
naming service, or a propagation protocol among active nodes that makes the refer- 
ences aware to adjacent nodes. The way the chain of service components is build is 
specific to the implementation of the service and not part of the framework. 

The QoS expected for the communication between objects can be specified when 
the binding is created. Server objects may export their interfaces to the binding speci- 
fying the QoS they expect at their interfaces, client objects may import the interfaces 
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also specifying the QoS they expect for the communication. In any case the QoS has 
to be established along the communication path between client and server. This can 
happen when a client imports an interface, i.e. connects to the binding, or on the first 
call on an imported interface. 

The request for establishing a QoS has to be propagated along the communication 
path and each node has to decide whether or not the request can be fulfilled. Follow- 
ing the admit/reserve pattern described in a previous chapter it can be avoided to re- 
serve resources without knowing if the reservation is admittable on all intermediate 
network nodes. Of course one has to take care about network nodes along the commu- 
nication path that don't support the distributed reservation service. This problem can 
be solved by over-provisioning or by adapting to available reservation techniques. 

The purpose of the distributed reservation service is to provide resource reservation 
for the communication between a multitude of distributed objects. The main objective 
is to share the available resources between requesting applications as effectively as 
possible. For this additional information like priority policies or timetables could be 
useful. The dynamic deployment of the service components allows a flexible response 
to the needs of applications. 



7 Implementation 

A prototype of the active DPE is implemented in JAVA using the modular and exten- 
sible Jonathan ORB [14] as the basis for the binding and resource control frameworks. 
The active DPE is deployed in a testbed consisting of three Hitachi Gigabit Routers 
2000 and three controlling PCs running on Linux. The active DPE is running in a 
JAVA virtual machine on the controlling PCs and accesses the router command inter- 
face via a Telnet connection. The router’s command interface is wrapped by JAVA 
objects forming a generic router API. Currently the DPE doesn’t support packet proc- 
essing, it only features the management of router resources. 




Fig. 7. Details of the logical active node 
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8 Conclusion 

The paper describes an ORB-based Active Networking approach. A framework for an 
active DPE is presented to integrate programmable networks, active networks, and 
distributed object technology. The framework supports the execution of network 
service as downloaded active components to provide QoS. These services flexibly 
program network resources through more dynamic and efficient resource manager 
interface. Their parallel execution is controlled, particular with respect to resource 
access and usage, to ensure safety. Distributed object applications obtain their QoS 
expectation with support of a generic reservation service, which is de-coupled from 
the underlying protocol and can be dynamically customized. In this framework, QoS 
support is transparently embedded in the communication stack as part of binding ac- 
tion. 

Policy is generally considered important as a mean by network administrators to 
control the active network. A dedicated policy-based management system interacts 
with other major services such as resource control, installation, and reservation. A 
policy defines identities so that decisions can be made whether a request should be 
granted or denied. The role of policies is to associate identities with rules that deter- 
mine the access and usage of resources. 

As for any other framework the design of components is crucial. Plenty of care has 
to be taken when specifying the components' interfaces: new services or applications 
should be able to treat the offered components as building blocks and take advantage 
by composing the provided functionality. 
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Abstract. Recently, active networks have been highlighted as a key 
enabling technology to obtain immense flexibility in terms of network 
deployment, configurability, and packet processing. Exploiting this 
flexibility, we present an active network application for real-time 
speech transmission where plugin modules are downloaded onto certain 
network nodes to perform application-specific packet processing. In 
particular, we propose to perform loss concealment algorithms for voice 
data streams at active network nodes to regenerate lost packets. The 
regenerated speech data streams are robust enough to tolerate further 
packet losses along the data path so that the concealment algorithms at 
another downstream node or at the receiver can still take effect. We call 
our approach active concealment for speech transmission to distinguish 
it from concealment performed at the receiver. Our approach is 
bandwidth-efficient and retains the applications’ end-to-end semantics. 



1 Introduction 

Most real-time multimedia applications are resilient and can tolerate occasional loss 
of packets to some extent but are sensitive to packet losses which are not in 
accordance to their flow structure. For example, Internet voice applications that use 
(sample-based) waveform-coded speech signals can exploit speech properties to 
conceal isolated losses very well. However, speech quality drops significantly in the 
occurrence of burst losses [4]. The Adaptive Packetization and Concealment (AP/C) 
technique successfully demonstrates that speech properties can be efficiently 
exploited to improve the perceived quality at the application layer [8]. However as 
AP/C exploits the property of speech stationarity, its applicability is typically limited 
to isolated, i.e. non-consecutive losses. Under circumstances where the rate of losses 
that occur in bursts is high, AP/C does not obtain any significant performance 
improvement compared to other techniques. We believe that this is the point where 
flexibility provided by active network nodes can be exploited to help applications at 
end systems to perform better. 
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In this work, we present an active network application where concealment 
algorithms are performed at network nodes to regenerate lost packets and to inject 
them into voice streams. The rest of this paper is structured as follows. First, we 
briefly review related work and the AP/C algorithm which we download to active 
network nodes to perform loss concealment for audio streams. We then present our 
approach of placing active network nodes at certain locations within the networks to 
leverage the efficiency of the receiver’s concealment performance. We perform a 
simulation study to evaluate the efficiency of our approach. We finally give 
conclusions of our work and end the paper with an outline of our future work. 



2 Related Work 

Recently, it has been proposed to push more intelligence into the networks to perform 
application-specific packet processing and actions at network nodes [9]. Significant 
performance improvements can be gained thanks to network nodes’ application- 
specific packet processing which takes into account the characteristics of packet 
payload. This is especially true for multimedia data which has a specific flow 
structure. Typical examples for application-specific packet processing at network 
nodes are media transcoding [1], media scaling [6], packet filtering [2], or discarding 
[3] for video distribution on heterogeneous networks with limited bandwidth. 
Surprisingly, there are very few active network projects that exploit active network 
nodes’ capability of application-specific packet processing to improve the quality of 
Internet voice or audio transmissions. The only work we are aware of is [2] where 
active network nodes add an optimal amount of redundant data on a per-link basis to 
protect audio streams against packet loss. Since most packet losses on the Internet are 
due to congestion (except for wireless networks), we argue that it can be problematic 
to transmit redundant data onto a link which is already congested. We propose an 
approach where application-specific packet processing is performed at an 
uncongested active network node to regenerate audio packets lost due to congestion at 
upstream congested nodes. 



3 Adaptive Packetization and Concealment 

AP/C exploits the speech properties to influence the packet size at the sender and to 
conceal the packet loss at the receiver. In AP/C, the packet size depends on the 
importance of the voice data contained in the packet with regard to the speech quality. 
In general, voiced signals are more important to the speech quality than unvoiced 
signals. Thus, if voiced signal segments are transmitted in small-size packets and 
unvoiced signal segments are transmitted in large-size packets and if the packet loss 
probability is equally distributed with regard to the packet size, more samples for 
voiced speech are received than for unvoiced speech. Considering the higher 
perceptual importance of voiced signal segments, this results in a potentially better 
speech quality when using loss concealment at the receiver. The novelty of AP/C is 
that it takes the phase of speech signals into account when the data is packetized. 
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AP/C assumes that most packet losses are isolated and that the packets prior and next 
to a lost packet are correctly received at the receiver. In AP/C, the receiver conceals 
the loss of a packet hy filling the gap of the lost packet with data samples from its 
adjacent packets. Regeneration of lost packets with sender-supported pre-processing 
works reasonably well for voiced sounds thanks to their quasi-stationary property. 
Regeneration of lost packets works less well for unvoiced sounds due to their random 
nature. However, this is not necessarily critical because unvoiced sounds are less 
important to the perceptual quality than voiced signals. Since the phase of the speech 
signal is taken into account when audio data is packetized, less discontinuities than 
for conventional concealment algorithms are present in the reconstructed signal. 



3.1 Sender Algorithm 

In AP/C, an audio “chunk” is defined as a segment of audio data that has the length of 
the estimated pitch period. In order to alleviate the overhead for protocol header, two 
audio chunks are copied into an audio packet and transmitted onto the network. When 
a packet loss is detected at the receiver, adjacent chunks of the previous and the 
current packet' are used to reconstruct the lost chunks. Information on the length of 
chunks belonging to those packets is transmitted as additional information in the 
current packet using the RTP header extension to help the receiver with the 
concealment process (“intra-packet boundary”). 

In order to estimate the pitch period, the auto-correlation of the audio input 
segment is calculated. Then the maximum value second to the maximum value at 
zero^ of the auto-correlation is searched for. This maximum value, its position, and 
the auto-correlation itself help to make the decision whether the input segment is 
voiced or unvoiced. If the input segment is classified as voiced, the position of this 
maximum is said to be the estimated value of the pitch period because the input 
segment shifted by that length is most similar to itself If the input segment is 
classified as unvoiced, the sender takes an audio chunk that has the length of T^ax (in 
AP/C, T^ax is the correlation window size and is chosen to be 160 samples, 
corresponding to 20 ms of speech). The found audio chunk is copied from the audio 
input buffer into an audio packet and the start position of the input segment is moved 
forward by the length of the audio chunk. 



3.2 Receiver Algorithm 

The receiver uses RTP message sequence numbers to detect packet loss and applies 
the AP/C concealment algorithm when an isolated loss is found^. RTP timestamp and 
information on the intra-packet boundary are used to determine the lost chunks’ 



' The packet carrying the sequenee number that allowed the detection of a previous packet loss. 

^ Of course the absolute maximum value of the auto-correlation is found at 0 because a signal 
without any shift is most similar to itself. 

^ In [7], we presented a scheme that combines AP/C with interleaving to cope with small packet 
burst loss. However, this scheme suffers from the additional buffer delay which is necessary 
at the sender. 
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lengths. If silence suppression is enabled and there is a silent period between the lost 
packet and its adjacent packets, the lost chunks’ lengths are not determined correctly. 
This is because RTP sequence number increments by one for each transmitted packet 
and RTP timestamp increments by one for each sampling period regardless of 
whether data is sent or dropped as silent. Thus, only the length of one lost chunk can 
be determined. Because the chunks’ length is smaller than T„,ax and a silent period is 
usually longer than 20 ms (corresponding to 160 p-law audio data samples), this 
problem can be easily detected when the length of a lost chunk is larger than T^ax- 
Due to the pre-processing at the sender, the receiver can assume that the chunks of 
a lost packet are similar to the adjacent chunks. The adjacent chunks (c^ and C 31 in 
Fig. 1 are resampled in the time domain to match the size of the lost chunks and then 
used to fill the gap of the lost packet. A linear interpolator as in [10] is used to 
perform resampling. The replacement signals produced by the linear interpolator have 
a correct phase, thus avoiding discontinuities in the concealed signal that would lead 
to speech distortions while still maintaining the pitch frequency at the edges. Due to 
the pre-processing at the sender, the lost and the adjacent chunks have a high 
probability to be similar. Thus, the concealment operation introduces no specific 
distortion in the concealed speech segments. Fig.l illustrates the concealment 
operation in the time domain. 




rcsaiflmgfecta: rcsampJmgfectcr 



Fig. 1. Concealment operation in the time domain 



4 Active Concealment 

Since AP/C assumes that most packet losses are isolated, it does not obtain any 
significant performance improvement compared to other techniques when the rate of 
burst losses is high. We believe that this is the point where active network nodes’ 
capability of application-specific packet processing can be exploited to help 
applications at end systems perform better. Since the burst loss rate of a data flow at a 
network node is lower than at the receiver, the AP/C concealment algorithm works 
more efficiently and more lost packets can be reconstructed when concealment is 
performed within the network rather than just at end systems. We thus propose to 
download and perform the AP/C concealment algorithm at certain active network 
nodes where the number of burst losses of a voice data stream is sufficiently low to 
regenerate the lost packets. The regenerated audio stream is robust enough to tolerate 
further packet losses so that the AP/C concealment algorithm can still take effect at 
another downstream active network node or at the receiver. 
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The idea of our approach is demonstrated in Fig. 2 . The AP/C sender algorithm is 
performed to packetize audio data taking the phase of speech signals into account. 
Along the data path, packet 2 and 4 are lost. Exploiting the sender’s pre-processing, 
the AP/C concealment algorithm is applied at an active network node within the 
network to reconstruct these lost packets. Downstream of the active network node, 
another packet is lost (packet 3 ) which is easily reconstructed at the receiver. In this 
scenario, active concealment reconstructs six lost chunks (C2i, C22, C31, C32, C4], and C42) 
and clearly outperforms the receiver-only concealment [8] which can only reconstruct 
at most two chunks (C21 and C42) due to the burst loss accumulated along the end-to- 
end data path. 




Fig. 2. Active concealment 



Our approach is similar to Robust Multicast Audio (RMA) proposed by Banchs et. 
al. in [2] but it acts in a reactive way upon detection of packet loss in audio data 
streams. On the contrary to RMA which transmits redundant data on a per-link basis 
to protect audio streams against packet loss in a proactive way, our approach simply 
regenerates and injects the lost packets into audio streams and thus is more 
bandwidth-efficient. Another advantage of our approach is that it does not break the 
applications’ end-to-end semantics and does not have any further demand on the 
number and location of active network nodes performing the concealment algorithm'*. 
RMA, however, requires active network nodes be located at both ends of a link or a 
network to perform FEC encode and decode operation. 



'* Clearly, the number and location of active network nodes influence the performance 
improvement. However, the applications’ functionality is not affected under any 
circumstances. 
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5 Simulations 

We perform simulations to study the performance of active concealment at network 
nodes. As first step towards the transition from traditional to active networks, we 
assume that there is only one active network node in the path from the sender to the 
receiver where intra-network regeneration of lost packets can be performed. The 
logical network topology for our simulation is shown in Fig. 3 where a lossy network 
can consist of multiple physical networks comprising several network hops. We use 
the Bernoulli model to simulate the individual loss characteristics of the networks. 
The efficiency of the schemes presented in this section is evaluated by using objective 
quality measurements such as in [5] and [11] to determine the speech quality. 
Objective quality metrics employ mathematical models of the human auditory system 
to estimate the perceptual distance between an original and a distorted signaF. 
Objective quality measurements should thus yield result values which correlate well 
and have a linear relationship with the results of subjective tests. We apply the 
Enhanced Modified Bark Spectral Distortion (EMBSD) method [11] to estimate the 
perceptual distortion between the original and the reconstructed signal. The higher the 
perceptual distortion is, the worse the obtained speech signal at the receiver is. The 
MNB scheme [5], though showing high correlation with subjective testing, is not used 
because this quality measurement does not take into account speech segments with 
energy lower than certain thresholds when speech distortion is estimated. 



Sender 



Fig. 3. Simulation topology 

The structure of this section is organized as follows. In the first simulation step, we 
use the same parameter sets for the lossy networks. We then compare the speech 
quality obtained by the active loss concealment with two reference schemes. In the 
second simulation step, we vary the parameter sets of the lossy networks and measure 
the efficiency of the active loss concealment. The parameter sets are chosen in such a 
way that the packet loss rate observed at the receiver is constant. This simulation step 
is performed to determine to optimal location of the active network node where the 
active concealment algorithm can be downloaded and performed. 




5.1 Performance Comparison to Reference Schemes 

In this simulation step, we compare the speech quality obtained by active loss 
concealment with two reference schemes: In the first reference scheme, the sender 
transmits voice data in packets with constant size and the receiver simply replaces 



^ We use a speech sample that consists of different male and female voices and has a length of 
25 s. 
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data of a lost packet by a silent segment with the same length. Each packet in this 
scheme contains 125 speech samples, resulting in the same total number of packets as 
the second reference scheme and the active loss concealment scheme. The second 
reference scheme is the AP/C scheme applied only at end systems. Packets are sent 
through two lossy network clouds and are dropped with the same packet drop 
probability. 

The parameters used in this simulation step and the resulting packet loss rate are 
shown in T able 1 . 



Table 1. Parameters and packet loss rate used in simulation for performance comparison 



Packet drop probability 


0.03 


0.06 


0.09 


0.12 


Packet loss rate 


0.0592 


0.1164 


0.1720 


0.2257 



Fig. 4 shows the results of this simulation step, plotting the perceptual distortion 
measured by EMBSD versus the network clouds’ packet drop probability. The results 
demonstrate that the higher the packet drop probability is, the higher the perceptual 
distortion of the schemes and thus the worse the speech quality is. 
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Fig. 4. Performance comparison to reference schemes (simulation step 1) 



AP/C performs better than reference scheme 1 which replaces lost packets by silent 
segments, and the active loss concealment obtains the best speech quality. When the 
network clouds’ packet drop probability is low, the active loss concealment does not 
gain any significant improvement compared to the AP/C scheme. This is because 
AP/C performs sufficiently well when the network loss rate is low and the number of 
burst losses is negligible. Flowever, when the packet drop probability rises and the 
burst loss rate is no longer negligible, the perceptual distortion obtained with AP/C 
increases significantly and the active loss concealment achieves a clear improvement 
as compared to AP/C. 
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5.2 Optimal Active Network Node Location 



In this simulation step, we vary the parameters of the lossy network clouds to 
determine the optimal location of the active network node. This simulation step is 
intended to help answering the following question: „Given that there are the same loss 
characteristics along the data path, where is the most effective location to download 
and perform the active concealment algorithm?" 

The packet loss rate of a data path consisting of two network clouds with packet 
drop probability/)/ and p 2 is given by 

p = (1) 



Thus, given the packet loss rate p and the packet drop probability pi of the first 
lossy network cloud, the packet drop probability of the second lossy network cloud is 
determined by 



pi = 



P-P^ 

p\ 



( 2 ) 



The result of this simulation step is presented in Fig. 5 using EMBSD to compute 
the perceptual distortion of the obtained speech signal at the receiver. 




It shows that the optimal location to download and perform the active concealment 
algorithm is where the packet loss rate from the sender to that location is equal to the 
packet loss rate from there to the receiver (pi = p 2 ). Note that while we show the 
simulation results for />j = p!2 , the actual minimum is located 
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at jOj = =\-^\-p x\-(\- p!2) = pH . If on one hand the packet loss rate from the 

sender to the location of the active network node is too high (pi » p^, the active 
concealment algorithm cannot exploit its advantage in terms of the location as 
compared to concealment just at the receiver. On the other hand, if the packet loss rate 
from the active network node to the receiver is too high {pi « P 2 ), the concealment 
algorithm at the active network node cannot be employed efficiently, because the 
majority of losses happen at subsequent network nodes. This effect is increasingly 
important when the packet loss rate (and thus the packet drop probability) increases, 
leading to a higher number of burst losses which causes the “conventional” 
concealment algorithm to fail. 

6 Conclusions 

We have presented a new active network application for voice over IP that exploits 
the flexibility of active networks to perform application-specific packet processing. 
By taking into account characteristics of the packet payload the efficiency of 
application-level algorithms has been leveraged. We have performed a simulation 
study to evaluate the efficiency of our approach. Simulation results have 
demonstrated that significant speech quality improvements are achieved compared to 
pure application-level algorithms. We also have run simulations to find the optimal 
location in a data path to download and perform the active loss concealment 
algorithm. It has been shown that the optimal location is where the network loss 
conditions are identical in both the up- and down-stream direction from the active 
node (considering deployment at only one active network node). An unoptimized 
software implementation of the active loss concealment reconstructs a lost packet 
with an average execution time overhead of 220 ps on a PC with a Pentium III 500 
MHz CPU and 128 Mbytes RAM. Thus it is obvious that the active loss concealment 
does not increase significantly the end-to-end packet transmission delay. Since the 
active network node only performs packet regeneration for a small portion of packets 
of voice streams, the average consumption of node resources is reasonably low. With 
an optimized implementation, a significant reduction of additional delay and overhead 
in terms of node resource consumption can be expected. 

While the resource consumption is thus not a problematic issue, the security and 
deployment implications of our scheme need still to be fully evaluated. For public 
multicast sessions, the operation is as follows: the active node adds itself as a member 
to the session. It receives and buffers enough data to perform loss concealment and 
then injects data on behalf of the sender (i.e. with the sender’s IP address) to the 
session. For unicast sessions the situation is more complicated as copies of packets of 
a particular connection need to be diverted to the loss concealment algorithm at a 
node and again re-created packets need to be sent in a way that pretends that they 
originated at the original sender. We consider these issues to be severe, however they 
fall into the general problem area of active network security. Thus, such problems 
should be solved at the active network platform level, i.e. the entities which provide 
for the deployment and execution of active code. We employ our algorithm on the 
BANG (Broadband Active Network Generation, [12]) platform which provides the 
needed security support. 
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Additional future work includes the investigation of active network applications 
where a number of active network nodes can be placed along the data path to 
download and perform the active loss concealment algorithm. Besides, it is very 
interesting to attempt to answer the question how well and how many times active 
loss concealment can be performed in a recursive way. Furthermore, since both 
application-level Forward Error Correction and application-specific packet processing 
incur additional consumption of network resources, we plan to compare these two 
approaches. The result of this comparison might enable an optimal combination of the 
two approaches to obtain further improvement of speech quality. 
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Abstract. Customers now expect advanced IP services like video 
conferencing and application rental become customizable and easy to 
use. So, network service providers must be able to dynamically deploy 
such Internet services fitting the requirements of service customers 
while avoiding the expensive solution of over-provisioning the service. 
However, the current Internet technologies are lacking when it comes to 
developing and deploying new services, integrating together service of- 
ferings. This paper presents a service architecture which enables the 
creation, on-demand deployment and management of advanced IP 
services using the active network capabilities. In our approach, each 
business IP service is considered as a composition of basic service 
components which can be shared among different business IP services. 
Upon the requests of service users and Service Definition information, 
initialized by providers, mobile code is downloaded into active nodes 
and installed in the network as new basic service components. Finally, 
this paper presents the implementation of a Content Delivery Service 
using our proposed service architecture. 



1 Introduction 

Until recently, the use of the Internet has been almost limited to a huge source of in- 
formation and a platform for e-commerce. But, users expect the Internet to become a 
multiservice worldwide platform, supporting a full range of advanced IP services in- 
cluding video conferencing, application rental or network based training service. Ac- 
tually, estimates of the economic impact of advanced IP services represents a market 
worth $120 billion annually in 10 years. By 2002, data services are expected to grow 
to around 70 percent of bandwidth. According to these figures, the future of advanced 
IP services seems to be glittering, but the current success of the Internet could also 
constitute the main barrier for its promising future. In fact, the enormous success of 
the Internet has led to serious problems for the Network Service Providers. 

Firstly, customers now expect advanced IP services become more intelligent, e.g. 
adaptable to each specific requirements. This service customization may propose dif- 
ferent service compositions and service presentations, proposing different types and 
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contents of user interface to customers, for instance. Besides, providers are also faeing 
with the explosion of the number of telecommuters and mobile users. These custom- 
ers move from one site to the other, but want to keep the possibility to access their 
serviees from everywhere with a good quality. Likewise, these people want to access 
the Internet from different types of terminal (laptop, PDA, mobile phone, etc.). Ac- 
cording to this situation, over-provisioning the resources is likely to be less cost- 
effective in a multi-service network than rationing the resources [10]. Said in other 
words, the service deployment process should be done on-demand, according to the 
users demand, the condition of the network or the price of available resources. Fi- 
nally, customers expect future IP services to become “consumable”, e.g. pay-per-use, 
and easy to use like the telephone today. Network service providers have to drastically 
improve the Service Quality while proposing customizable services deployable on- 
demand. 

However, creating, deploying and managing this type of services constitute a real 
burden for network operators. In order to support this strong evolution, IP networks 
have to contain enough flexibility to enable the development of such services. How- 
ever, the development of IP services is currently restricted by the characteristic of the 
network. The current Internet technologies are lacking when it comes to developing 
and deploying new services, integrating together service offerings. As the active net- 
working technology has been chiefly designed to enable the insertion of new services 
in the network [8,11,12], it can be a suitable solution to these issues. Indeed, this 
technology enables a customization of the network by inserting in the network some 
new functionalities. 

The remainder of this paper is organized as follows. The following section details a 
possible business service architecture to support advanced IP services and introduces 
why the active network technology can provide a suitable answer to the different is- 
sues aforementioned. Then, section 3 defines the concept of service composition and 
the main two components of our multi-tier architecture: the Service Broker and the 
Broker Controller. Section 4 details the lifecycle of a service including its creation, 
deployment and management. Afterwards, section 5 considers the example of the 
“content delivery service” to illustrate the architecture presented in the previous sec- 
tions. Finally, section 6 ends with concluding remarks and proposes some directions 
for further works. 



2 IP Services Evolution 

While IP traffic continues to grow exponentially, generating high-margin revenue and 
creating unique customer value from selling pure bandwidth remains a challenge. 
Competitive pressures and technological advances have chiseled away the high mar- 
gins on data access and bandwidth. Due to ongoing improvements in broadband ac- 
cess technologies, the revenue produced from the bandwidth commerce is relatively 
small compared to voice services. Data represent 50 percent of the bandwidth use but 
only 12 percent of the revenues. So, network service providers must provide custom- 
ers with services going beyond the classical network services. They must play a role 
in providing on-demand these customizable IP serviees. However, providing such 
services is not an easy task in the classical and static Internet. 




Multi-tier Architecture for Service Creation, Deplo 5 Tnent and Management 251 



As Juhola and al. mentioned [7], Active Networking can improve the ability of 
network providers to penetrate the lucrative Internet-based markets by bringing the 
following benefits: inherent mobility of users, more available services based on a li- 
brary of basic service components, additional flexibility to facilitate fast service intro- 
duction and enhancements for complex services. Besides, the 3''* party development 
of value added services will arise, providing customers with an increased range of 
services in response to their various needs. According to this evolution, we consider, 
in this document, a business service model [6] composed of three actors that have 
unique roles in supplying such advanced IP services: the Application Service Provider 
(ASP), the Network Service Provider (NSP) and the Service Portal. 

In this business architecture, the ASP focuses its activity on providing a growing 
range of advanced IP services including application rental or video conferencing. So, 
the ASP develops the code necessary for the implementation of these IP services that 
will be installed in the network. On the other hand, the NSP needs to sell high-margin 
data application services analogous to the multitude of enhanced voice services that 
are offered to customers along with basic voice transport. Besides, the NSP offers 
pure network services, sueh as VPNs, as well as network serviees packaged with ap- 
plication services offered by application service providers, such as application rental. 
Finally, the Serviee Portal plays the role of serviee retailer for the IP services devel- 
oped by the ASP, providing a portal to individual and corporate customers. 

According to this business service model, individual or corporate customers con- 
nect these service portal sites and choose among a list, the service which fits their 
requirements at that moment. For instance, if the corporate customer needs to create a 
video conferencing session between different corporate sites, he wants to be able to 
activate this IP service easily and only on-demand. Customers must be able to easily 
activate and de-activate services. Today, provisioning such a service is time and re- 
source-consuming. Likewise, the content of these portals should be customized to 
each customer requirements. Then, providers can promote more efficiently their 
service offer according to the characteristics of customers. The main purpose of our 
proposal is to provide network service providers with a platform that enables an on- 
demand deployment of IP services, while limiting, as much as possible, the cost of its 
management. 



3 Multi-tier Architecture for IP Services 

We consider the situation where active nodes are scattered throughout the Internet. In 
our approach, we consider active nodes as end systems rather than network devices. 
Indeed, putting the active nodes at the edge of the network has two main advantages: 
first, the “active services” are closer to end-users, reducing the response time of the 
service implemented in these nodes. Moreover, network resources located in the cen- 
ter of the network are scarce compared to those located at the edge. 




252 Gaetan Vanet et al. 



3.1 The Concept of “Basic Service Component” 

As we aforementioned, new Internet services must be customizable, rather viewed as 
a composition of basic service components than a static and unified service [1,3,7], 
common to all customers, as it’s the case today. In fact, a service can be considered 
with two different views: the service as a business product and the service as the basis 
of this business product - or its implementation. From the point of view of end-users, 
the service is a business product they pay for. It can be Email, Document Printing or 
Video Confereneing serviee. Actually, each of these services may be constructed from 
several distinct underlying items - serviee components - integrated through a number 
of possible mechanisms. The main idea of such an approach is to build complex 
services on the top of primitive and basic service components. Actually, these mecha- 
nisms of integration are beyond the scope of this paper and constitute the subject for 
future researches. In this document, we will consider services as basic service compo- 
nents which must be deployed to provide customers with a business service product. 



3.2 Architecture Overview 

In the section 2 of this document, we presented a business service model composed of 

the network service provider, the application service provider and the service portal. 

We actually defined the following multi-tier architecture based on this model. Our 

architecture, depicted in figure 1, is composed of the following components: 

• The service users are the Tier One of this architecture. We assume these users ac- 
cess the network through Points of Presence, or POPs, and use the classical Web 
browser as service interface. Actually, the ubiquity of Web browsers makes them 
the de-facto standard for service interface; 

• The Middle Tier is composed of three entities: the Service Broker, the Broker 
Controller and the basic service components. The Service Broker plays the role of 
interface between end-users and ASPs servers, achieving a pre-processing of the 
requests sent by end-users to these servers. Besides, it locally controls and manages 
the implementation of services in active nodes. Finally, it provides end-users with 
the list of the services they can use and configure. The middle tier also contains the 
Broker Controller. The Broker Controller function is implemented by the Service 
Portal. The ASP manages the implementation of its basic service components and 
may access the Broker Controller through Internet for instance. The Service Broker 
and the Broker Controller, considered together, play the role of the Service Portal. 
Finally, the Middle Tier contains the basic service components. Each basic service 
component provides one specific service like data compression or data encoding. 
Then, basic service components collaborate to provide customers with a business 
service. For instance, an Internet Access Mail Protocol daemon and an HTTP 
proxy basic service component may collaborate to provide customers with an 
Email service. Our approach has two strong differences compared with the classi- 
cal three-tier architectures: the service components are deployed on-demand; they 
can collaborate to provide customized business IP services; 

• The Tier Three is composed of the ASPs servers and, the service management 
system related to each service and implemented by the corresponding ASP. 




Multi-tier Architecture for Service Creation, Deployment and Management 253 




Fig. 1. Multi-tier Architecture Overview 



Figure 1 introduces the concept of “active” middleware. Actually, this middleware 
has two main functions: it enables the code downloading of the different components 
of our architecture into the active nodes; it allows the communication between these 
components. 

3.3 The Service Broker 

The Service Broker can be considered as a dynamically deployable component, play- 
ing the role of service interface between service users and ASPs servers, and the role 
of management interface between the Broker Controller and the serviee access points. 
As a Network Service Contractor, the Service Broker guarantees the service by pro- 
viding the necessary service access point, consuming active node resources, rent by 
the NSP. It is deployed on-demand by the Broker Controller (cf section 3.4) to the 
suitable active nodes of the network. The deployment of Service Brokers is done ac- 
cording to the behavior of the customers, the load of the other Service Brokers or any 
other conditions defined by the Service Portal. Basically, Service Brokers are located 
near the edge of the network, close to end-users. The Service Broker component has 
different functions: it locally manages the creation, deployment and management of 
the basic service components; it handles the service requests sent by service users and 
forwards these messages to the suitable service access point according to specific load 
balancing policies; finally, the Service Broker has the function of publishing for cus- 
tomers all the services they can access or the services they might be interested in us- 
ing the concept of Service Home Page (SHP). A Service Home Page is a user inter- 
face, written in HTML for instance, that can be different from one service user to the 
other, and customized to each specific needs and interests. The SHP content is defined 
by the Service Portal and stored by the Broker Controller. Actually, when the Service 
Broker receives a request from a service user, its gets from the Broker Controller, the 
SHP related to the customer connecting the service. The basic architecture of the 
Service Broker is depicted in figure 2. The Service Broker provides system with two 
distinct interfaces: the FEServiceBroker interface and the INServiceBroker interface. 

The FEServiceBroker interface is both a service and management interface in- 
tended for both service clients and basic service components implemented in the net- 
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work. The FEServiceBroker interface provides two classes of methods: the first class 
is invoked by the basic service components to create, initialize, delete, start, stop, sus- 
pend and resume other basic service components implemented in the network; the 
second class of methods is invoked by service clients to download the code of the 
“suitable” SHP and get the authorization to connect the service they request. 

The INServiceBroker interface is a management interface used by the Broker Con- 
troller to ask the Service Broker to create, delete, start, stop, suspend and resume the 
basic service entities implemented in the active nodes of the network. In fact, the 
Service Component LifeCycle Server component manages the different steps of the 
lifecycle of a service component. The decision of creating or deleting a service com- 
ponent can be done upon requests of other service components or Broker Controller. 




The Service Broker is also composed of two other components: the Service Home 
Page Client and the Access Controller. The SHP Client requests from the Broker 
Controller the Service Home Page suitable to the customer requesting the service. The 
Access Controller component achieves the first verification of the rights a customer 
may have to connect a service. This first verification is actually based on data cached 
by the Service Broker. If no data are contained in the Service Broker, this latter re- 
quests the Broker Controller to complete this authorization sequence. 

3.4 The Broker Controller 

The Broker Controller is implemented by the Service Portal and plays the role of 
manager of the service architecture. It supervises the creation, the deployment and the 
management of the basic service components and Service Brokers (cf. section 3.3) 
implemented within the different active nodes of the network. In our approaeh, the 
deployment and management of IP services are based on the definition of rules; each 
rule can be either service or network specific. Service specific rules are defined by the 
Applieation Service Provider and detailed how one specifie service must be provi- 
sioned and deployed in the network. For instance, if we consider the implementation 
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of a distributed caching service, one of the rules might have the following semantic: 
“put the caching function at the network intersection nodes”. 
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Fig. 3. Broker Controller Architecture 



The deployment of new services can be based on service and network statistics or 
service constraints. The network specific rules are defined by the Network Service 
Provider. This type of rule details how the network resources must be used in order to 
implement the services. This set of two types of rules are called the Service Defini- 
tion. The main purpose of this policy-based approach is to automate the processes of 
service creation, deployment and management in the Internet network. This policy- 
based service deployment allows the Service Portal to deploy the services designed by 
the NSP and ASP at the right time and at the right place in the network. Figure 3 rep- 
resents the basic architecture of the Broker Controller component. We defined two 
interfaces for the Broker Controller: the FEBrokerController interface and the INBro- 
kerController interface. The FEBrokerController interface provides network and 
service providers with some methods to manage the deployment and management of 
active services within the network, to create customer profile and to manage the set of 
service rules. The INBrokerController interface provides Service Broker with the 
same methods as the interface aforementioned with methods to authorize one client to 
access a specific service and to download the Service Home Page into a customer’s 
web browser. 

The Broker Controller also contains other functional components. The Client Pro- 
file Manager maintains all the information related to service customers: the customer 
id and the list of the services one customer can access. The Service Home Page Man- 
ager stores and generates the appropriate Service Home Page for a specific customer. 
The Scheduling Module is a functional component added to the policy service. It en- 
ables the deployment of active services in active nodes at dates specified by the appli- 
cation service provider. Finally, the Traffic Analyzer periodically gets the load of the 
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different basic service components implemented in the network. Then, it decides, 
based on the Service Definition information, to duplicate a new service component or 
to delete a service component which is not accessed anymore by customers. 

4 The Service LifeCycle 

Based on the concepts of Broker Controller and Service Broker detailed in the previ- 
ous section, this section focuses on the lifeeycle of one service considering its erea- 
tion, deployment and management. In this section, we consider the Network Service 
Provider also plays the role of Service Portal. 

4.1 Service Provisioning 

The design, deployment and activation of services require multiple steps. Their se- 
quence is depicted in figure 4. In step 1, the ASP contacts the NSP to register his 
service. This registration is done using the Broker Controller FEBroker Controller 
interface. At that time, the ASP announces the e-service it provides, providing the 
Broker Controller with the Service Definition information necessary for the service 
deployment. The service definition specifies the basic service components which must 
be deployed and the QoS requirements. The QoS requirements can be dependent of 
the class of users, for instance. Then, these service eonstraints are modeled as rules by 
the ASP and initialized by the ASP in the Broker Controller through the FEBroker- 
Controller interface. For instance, the ASP defines that the provisioning of its Email 
service requires the implementation of an Internet Mail Access Protocol daemon and 
an HTTP proxy. Besides, he can specify that the response time of the “HTTP proxy” 
basic service component must be lower than 10 seconds. 

In step 2, the dial-up user connects a Point of Presence (POP) of the network and 
requests the Service Broker. Receiving this request, the Service Broker contacts the 
Broker Controller to get the Service Home Page. This SHP can be an HTML page for 
instance (step 3). Then, the customer uses this SHP to enter his choice concerning the 
service he wants to access (step 4). The Service Broker achieves a first access 
authorization check for this customer. If it does not cache any data concerning this 
user, it invokes the user authorization method provided by the Broker Controller IN- 
BrokerController interface (step 5). The Broker Controller authenticates the customer 
and checks if he can access the required service or not. If it can, the Broker Controller 
decides, based on the Service Definition data previously defined, the basic service 
components which must be deployed and their “suitable” location in the network 
(step 6). The location of this new service in the network is important because, as 
mentioned in [2], the performance of a service mainly depends on the number and the 
position of application servers implemented within the network. In the example of 
figure 4, a new basic service component is created, close to the POP. Indeed, this ba- 
sic service component can be viewed as a new Service Access Point, created on- 
demand. When the creation of the service component that provides customers with the 
required business service is achieved, the customer gets the location of this new 
Service Access Point from the Service Broker. Then, he can access the service 
(step 7). 
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So, the deployment of one business service in the network is done upon the request 
of end-users, and only when it is requested. This scheme is an on-demand service 
subscription. This dynamic service deployment is done to optimize the use of network 
resources. Such an approach enables Network Providers to have a more efficient use 
of their resources in the classical debate over whether over-provisioning is likely to be 
more cost-effective than rationing resources in a multi-service network environment. 



4.2 Service Management 

The service management is done by the Broker Controller, according to the rules 
specified by the service providers. The Broker Controller periodically gathers the load 
of the different basic service components implemented throughout the network. The 
system can collect service quality information through management agents imple- 
mented in the different active nodes of the network. The load or the response time of a 
basic service component are some examples of the data the system can collect as 
service quality information. Then, the Broker Controller compares these values with 
the thresholds defined previously by the service providers. If a threshold is not re- 
spected, the Broker Controller can react to modify the configuration of the service. If 
a basic service component is overloaded, the Broker Controller can decide to deploy 
new service components. If a service component becomes idle, then it is removed. So, 
the network service provider is responsible for the provisioning and management of 
IP services but respecting the Service Definition rules defined by the wholesaler of 
the service, the Application Service Provider. 
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5 Implementation 

This section describes the prototype we are implementing to demonstrate the validity 
of our approach. In this prototype, we consider the provisioning of a content delivery 
service. The basic idea of all the Content Delivery Services is the same: make copies 
of important content and move them to servers located near to the content users. 
Having content near the user has a clear and direct benefit in performanee. Basically, 
the performance of a content delivery service mainly depends on the location of the 
different replieas. But, deploying these replica in the suitable locations is a very com- 
plex task especially considering the situation of mobile users who ean access a service 
from everywhere. The architecture presented in the previous sections of this paper can 
provide a suitable answer to this situation. 

We are implementing this prototype over the NEC front-end middleware [4]. Like 
the Application Layer Active Networking proposal [5,9], the front-end middleware 
enables a good integration with existing IP networks by simply overlaying the basic 
network infrastructure. In fact, the front-end middleware enables the dynamic de- 
ployment of piece of code, called front-end, in active nodes located at the edge of the 
network. A front-end is devoted to the preliminary processing of data going from cli- 
ents to servers. The use of the front-end concept brings different benefits: putting the 
front-ends at the edge nodes of the network reduces the service response time; front- 
ends provide customers with more user-friendly and customized interfaces; besides, 
application servers can subcontract specific tasks to these front-end elements and fo- 
cus their resources on fundamental tasks; finally, front-ends limit the network traffic 
by only forwarding to servers the data they cannot process. It enables the network to 
provide some part of a business service, going beyond the “traditional” approaches 
like data caching or load balancing which only provides data. 

The provisioning of the content delivery service requires the implementation of 
HTTP daemons in different active nodes of the network. In our approach, each HTTP 
daemon is considered as a basic service component. The service user requests a spe- 
cific web page to one of this daemon. If this daemon does not contain the data, the 
request is automatically forwarded to the corresponding application server or to an- 
other HTTP daemon. In this implementation. Service Brokers, Broker Controller and 
HTTP daemons are implemented in the NEC front-end middleware as a front-end. All 
of these components are implemented in Java and the CORBA IDL is used to define 
the different interfaces of the Service Brokers and Broker Controller detailed in sec- 
tions 3.3 and 3.4. The configuration of our prototype is given in figure 5. For further 
details concerning the front-end middleware functionalities, the reader should refer to 
[4]. It should be noted that only the main functional components of the Service Broker 
and Broker Controller are represented in figure 5. 
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Fig. 5. Prototy^pe Configuration 

Figure 6 shows the sequence diagram of the creation and deployment of the “con- 
tent delivery service” that is composed of a unique basic service component, an HTTP 

daemon. The sequence proceeds with the following steps: 

1 . The Application Service Provider registers the identity of his service, the path of 
the code and the location of the code server providing the code necessary to im- 
plement the service. 

2. The ASP creates the profile for a new client, including a personal id and the list of 
all the services he can access. In our example, the “content delivery service” is 
registered. 

3. Then, the ASP specifies the rules, or Service Definition data, necessary for the im- 
plementation of the service. Now, the service has been created and a client profile 
has been defined. 

4. Afterwards, the client cormects the Service Broker using his classical Web 
browser. Receiving this request, the Service Broker downloads from the Broker 
Controller the Service Home Page specific to this service user using the getSHPQ 
method of the INBroker Controller interface. 

5. The client chooses the “content delivery service”. It invokes the FEServiceBroker 
interface with the authorizeClientQ method to notify the Service Broker about this 
choice. The Service Broker then checks whether this client can connect or not the 
service. As he can, it invokes the INBrokerController with CreateServiceEntityQ to 
ask the Broker Controller to create the Service Access Point providing the “content 
delivery service”. 

6. According to the Service Information data, the Broker Controller decides the loca- 
tion of the HTTP daemon and creates this service component using the function- 
alities of the NEC front-end middleware [4]. 

7. The Broker Controller requests the INServiceBroker with initializeAP Entity to ini- 
tialize the state of the HTTP daemon. Finally, an acknowledgement is sent back to 
Service Broker. Then, the authorizeClientQ method returns the location of the 
HTTP daemon the user can now access. 
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The “Content Delivery Service” is based on the use of several HTTP daemons ba- 
sic service components deployed throughout the network. The management of the 
configuration of this architecture is achieved using the active cache approach [13]. In 
fact, the active cache mechanism enables to reconfigure automatically the hierarchical 
architecture composed of these basic service components. As we mentioned in section 
3.2 of this document, the active network is a very suitable solution to support our pro- 
posed multi-tier architecture. To complete the implementation of the “content delivery 
service” presented in this section, we could make a better use of the active network 
technology. Indeed, this technology can improve our prototype by the following 
ways: it enables the dynamic deployment of the Service Brokers according to the 
pattern of clients accesses [2]; it also improves the management mechanism of our 
multi-tier architecture by deploying management functions and event filters specific 
to the basic service components implemented in the area [3]; moreover, the active 
network technology facilitates the collection of services information by flexibly in- 
stalling concentrators within the network [3,8]. 



6 Conclusion 

This paper has described a multi-tier architecture based on active networking tech- 
niques for creating, deploying and managing advanced IP services. The main philoso- 
phy of our proposal is to enables network service providers to provide service cus- 
tomers with customizable, on-demand deployable IP services. The definition of a new 
business service relies on the composition of basic service components. Then, the 
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deployment of this new service is based on the Service Definition information defined 
by both the Application and the Network Service Provider. This policy-based ap- 
proach enables to quickly provide IP services while limiting as much as possible the 
use of network resources. 

In the immediate future, we intend to verify the robustness of the mechanism of 
service composition. Our solution is currently based on the use of a workflow func- 
tion which dispatches the service requests to the different basic service components 
composing a business service, respecting a sequence specific to each business service. 
We also plan to study the best way to define and interpret the network and service 
rules. 
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Abstract. Programming the network infrastructure significantly 
enhances its flexibility and favors fast deployment of new protocols, but 
also introduces serious security risks. It is crucial to protect the whole 
distributed infrastructure, especially its availability in case of denial-of- 
service attacks. A security framework for programmable networks may 
provide security solutions at different levels of abstraction. Active 
networks mainly propose a network-layer approach, by extending the 
packet format to include security information. Mobile code technologies 
tend to provide security tools at the application layer to integrate with 
standard external infrastructures, such as public key ones. The paper 
describes the security frameworks of several programmable network 
proposals and points out the dis/advantages related to the adopted 
abstraction level. This comparison suggests to consider an integrated 
security framework capable of choosing the service-specific balance 
between application-layer flexibility and network efficiency. To this 
purpose, the paper presents the architecture of a Programmable Network 
Component (PNC) that integrates security solutions at different layers 
and that has been implemented by using a mobile agent programming 
environment. 



1 Introduction 

The convergence of telecommunication systems and the Internet proposes a global 
shared network infrastructure with new value-added services for all participants (final 
users, service providers, and network operators) [1]. The management of the 
infrastructure is increasingly complex, because of its global dimension, of network 
resources heterogeneity, of the request for dynamicity in offered services, and of 
increasing user requirements and expectations. To satisfy these requirements, the 
traditional end-to-end model of interaction in the network is evolving toward an 
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alternative scenario where the network infrastructure can play an active execution role. 
In particular, in Programmable Networks (PN), interconnection components can 
perform computations on transmitted data and can be programmed by dynamically 
injecting service/user-specific code [2]. Several approaches and technologies have 
been proposed for the realization of PN, and can be roughly classified on the basis of 
the principal abstraction layer; the term Active Networks (AN) usually identify the 
approaches that achieve programmability by working mainly at the network layer, 
whereas we consider Mobile Agents as an enabling technology that achieves 
programmability at the application layer [2]. 

Many research groups have recently claimed PN suitability for a wide spectrum of 
applications [3] [2], PN can help in fast prototyping and deploying new network-layer 
protocols (e.g., for congestion control and topology-aware reliable multicast). Other 
proposals employ network programmability to deal with application-specific 
requirements, as in Web caching and in dynamic adaptation of multimedia streaming 
to currently available resources [4] [2]. All application scenarios require that PN 
environments provide adequate answers to the security issues raised by network 
programmability. The main security concern is to achieve a full protection of the 
shared network infrastructure against illegal accesses and denial-of-service attacks. 

The paper discusses some different security solutions in the PN research area 
depending on their specific level of abstraction. Some approaches in the AN area 
suggest the adoption of security mechanisms at the network layer. They usually tend to 
standardize security data by directly enclosing them into packets [5] [6]. Other 
approaches propose solutions at a higher level of abstraction, to exploit the flexibility 
and extensibility typical of the application layer [3]. On the one hand, network-layer 
approaches focus on efficiency but often lack flexibility and dynamicity. On the other 
hand, application-layer solutions permit to integrate with existing infrastructures for 
rapid prototyping and deployment, but do not often achieve performance. 

The paper presents the architecture of a Programmable Network Component 
(PNC), designed to fast prototype and deploy protocols/services in the global, 
heterogeneous and untrusted Internet environment. In particular, the paper focuses on 
security aspects and proposes the integration of network- and application-layer 
solutions. An integrated approach to security permits service designers and system 
managers to satisfy different security requirements, from high dynamicity in the 
modification of security data to strict respect of timing constraints, from 
interoperability with existing infrastructures to scalability, crucial for handling a large 
number of users. We claim that only a solution that integrates mechanisms and tools at 
both layers can achieve the efficiency of the network layer together with the flexibility 
of application-layer solutions. The proposed PNC architecture has been implemented 
by using a Mobile Agent (MA) framework called Secure and Open Mobile Agent 
(SOMA) [7]. The SOMA platform exploits the Java technology for agent serialization, 
dynamic class loading, networking support and for the ubiquitous availability of the 
Java virtual machine. 
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2 Security Issues in Programmable Networks 

Network programmability raises significant concerns from the security point of view. 
Possible threats and attacks to PN are far more critical than in passive network 
infrastructures. In fact, the possibility of injecting code to modify the behavior of 
network components can compromise not only the correct operations of one node but 
also the availability of the whole network. To face these threats, the design of a 
general PN environment should grant an adequate level of security since its first 
phases, and security cannot be considered an add-on to insert only a posteriori. 

The security framework of a PN environment should be based on a thorough 
security model to protect all involved entities, both network infrastructure (the set of 
all programmable nodes) and active packets (the single pieces of code injected into 
the network). More in detail, it is necessary to protect: 

• the network resources against malicious behavior of active packets, to maintain the 
availability of the shared network infrastructure; 

• the active packets against attacks from malicious network nodes, to grant the 
correctness of the service provided by active packets over the whole network path 
between service users and providers; 

• the active packets when transiting in the network, to detect possible modification 
and to prevent malicious sniffing; 

• the active packets from interfering with each other, to avoid the possibility of 
combined attacks performed by colluding active packets. 

The PN security framework should answer the fundamental issues of authentication, 
authorization, secrecy, and integrity and should provide the requested models of trust. 
Any trust model defines who or what in the system is considered trusted, in what way, 
and to what extent [8]. 

Authentication permits to associate active packets with responsible principals, 
where principals represent the subjects that request the operations, e.g., an individual, 
a corporation, a service provider, and a network administrator. In practice, any 
principal can be associated with personal public/private keys and digitally signs 
packets to ensure the correct identification of their responsible party. The 
authentication process safely verifies the correspondence between principal identities 
and keys. Most authentication solutions delegate key lifecycle management to Public 
Key Infrastructures (PKIs) [9]. Authentication also ascertains the paternity of active 
packets by associating them with either their principal or their responsible role. A role 
models a collection of rights and duties that characterizes a particular position within 
an organization. A role-based model facilitates the administration and management of 
a large number of principals, by simplifying the dynamic handling of principals and 
permissions [3]. 

Authorization grants active packets the permissions to operate on the resources of 
the network infrastructure. Several authorization models are possible: the most 
common is the Access Control Lists (ACL) model that describes and enforces the 
access rights of principals/roles on a resource. More generalized models, such as the 
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Trust Management model, can provide a unified framework for the specification and 
interpretation of security policies in distributed systems [10], 

In addition, the security infrastructure should prevent the possibility of modifying 
and inspecting active packet contents (integrity and secrecy issues) while migrating 
over untrusted networks and executing in malicious nodes. When considering the 
protection of an active packet in transit over an end-to-end communication channel, 
traditional cryptographic techniques can establish secure channels to ensure both 
integrity and secrecy between end-to-end network nodes. This approach is not 
sufficient in the PN area where intermediate hosts have to verify incoming active 
packets before their execution. This requires a hop-by-hop control that implies the 
establishment of a trust relationship between all involved intermediate nodes [2]. 

In PN environments another issue concerns the possibility to control the behavior 
of incoming active packets while in exeeution. Several PN infrastructures confine the 
execution of different active packets into isolated environments to prevent reciprocal 
interference, and to avoid possible collusion against the hosting network node and 
provide monitoring services to exclude excessive resource consumption that can lead 
to possible denial of service attacks [11]. 

A general security framework for PN environments should provide strategies and 
mechanisms of solution for all the above issues. The same infrastructure can offer 
different solution implementations to make available different qualities of security 
service. In any case, some general properties have to be considered to deal with global 
and heterogeneous distributed systems such as PN. 

The basic requirement to satisfy is the durability of design efforts. Whenever a 
system has been completely deployed, its lifetime strictly depends on its capacity of 
following the evolving needs. In other words, the security model should be flexible 
enough to accommodate any suitable variation and should extend easily to embody 
reasonable additions to components. These extensibility and flexibility properties can 
be achieved along synergic guidelines, preventive solutions and design technologies 
that favor the addition/substitution of system modules. For instance, the association of 
one principal with several roles can help in changing the principal permissions to 
adapt to different and evolving environments. The same is for the versioning of 
security tools, which can coexist in different versions within the same system at the 
same time, if the design maintains sufficient information to distinguish between the 
different installations. 

Another requirement is dynamicity. PN are global systems and the availability of 
the network infrastructure is a necessary condition. For this reason, all security 
solutions have to maintain system effectiveness while incorporating variations. For 
instance, while a programmable router is receiving a new protocol version that affects 
the handling of specific streams, not only routing operations should go on, but also no 
packets (either active or normal) should be lost. 

A final but fundamental consideration for the implementation of a PN security 
infrastructure, which influences all design choices, is to meet an adequate level of 
performance. PN call for security solutions capable of meeting cost requirements and 
of achieving a suitable trade-off between the necessary security degree and the usage 
of time and resources. 




266 Paolo Bellavista et al. 



3 Security Solutions in Programmable Networks 

The aim of this section is not to provide a general survey on the state-of-the-art of the 
PN research, but to organize and give some technical insights about the projects that 
have specifically worked on solutions at different levels of abstraction for the PN 
security issues. 

According to this guideline, we first present two architectures that base their 
security solutions on the insertion of security data directly within transmitted packets. 
They are the Secure Active Network Environment (SANE), employed in the 
SwitchWare project at the University of Pennsylvania [12] and the Smart Packets (SP) 
proposal of the BBN [13]. Other PN approaches tend to rely on security mechanisms 
and tools that are more at the application layer: Section 3.2 presents some trends 
emerging in these PN architectures and gives some insights of the Agent-Based 
Security Architecture for the Active Network Infrastructure (ABSANI) developed at 
the GMD Fokus [3]. 



3.1 Security Data within PN Packets 

Several research efforts have addressed the issue of defining new formats for network 
packets to include security-relevant information [5] [6]. These activities propose a 
structure of packets that permits efficient security processing at packet 
forwarding/reception, on the basis of the security data contained in packet headers. 

The most recognized work toward the standardization of the PN packet format is 
the Active Network Encapsulation Protocol (ANEP) [5] that proposes a common base 
to increase interoperability among different PN projects. The main purpose of ANEP 
is to fast identify the environment in which to evaluate incoming active packets by 
examining the content of specific fields in packet headers. ANEP packets can be 
transmitted directly over the link layer, or they can be encapsulated within an existing 
network protocol such as IP. 

The current security support in ANEP is limited to the provision of one-way 
authentication with X.509 and SPKl self-signed certificates [9]. All the network-layer 
security approaches in the PN area have proposed specific extensions to the ANEP 
header to provide and manage security issues in a more general way. 

3.1.1 Secure Active Network Environment 

SANE provides a layered security architecture to ensure the correct behavior of 
incoming active packets. At the lower layers, SANE guarantees that its PN 
components start in an expected state by exploiting a secure bootstrap mechanism, 
called AEGIS [12]. The higher layers are responsible for active packet 
authentication/encryption, for the provision of a restricted execution environment 
based on a type-safe dedicated language for active packets [3], and for the safe 
partitioning of separate name spaces to the different node services. 

SANE extends the ANEP format to support packet authentication and secrecy. The 
approach is similar to the IPsec protocol and to its provided security associations [6]. 
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The SANE packet includes a Security Parameter Index (SPI) to identify uniquely the 
corresponding security association. Figure 1 shows the SANE authentication header 
that provides a first basic mechanism to detect replay attacks and that ensures packet 
origin and data integrity. 

SANE employs a secret-key scheme that requires a preliminary application-layer 
negotiation, called Key Establishment Protocol (KEP), to determine security 
associations between all involved PN nodes and between the PN infrastructure and its 
users. At bootstrap, the PN nodes verify the accessible network topology and perform 
KEP steps with adjacent nodes. After two PN parties have achieved mutual 
authentication and agreed on the utilization of specified secret keys and cryptographic 
algorithms, they can start to exchange authenticated and encrypted active packets. Any 
PN node stores negotiated parameters locally until the corresponding security 
association is broken. In particular, active packets can follow a path that involves a 
large number of PN nodes, such as in multiple-hop PN protocols, if all involved PN 
nodes have mutually established security associations. 

SANE exploits secret-key-based security for the sake of performance and limits the 
public -key usage only during the security association phase. In addition, the 
information required to perform security checks is maintained locally at the active 
nodes: when receiving an active packet, the SANE node locally retrieves the security 
parameters indexed by the SPI to complete the verification of the authenticator 
integrity. 

The SANE authorization support exchanges information about user permissions 
over authenticated and encrypted channels, established by using ANEP-compliant 
packets. In particular, KeyNote -based credentials [10] specify the policies to rule the 
operations of active packets on system resources. 




Fig. 1. The SANE packet format 

3.1.2 Smart Packets 

SPs have been proposed in a DARPA project that focuses on the PN application to 
network management. SP distinguishes two different modes, end-to-end and hop-by- 
hop. In the end-to-end mode, only SP endpoints can execute SP protocols, while in the 
hop-by-hop mode the source, the destination and all SP intermediates actively 
participate to deploy the active protocol. 

To concentrate on the adopted security solutions, SP authenticates the origin of 
active packets by checking their integrity and by providing a confined and controlled 
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execution environment. A dedicated specialized language, Sprocket, limits the 
operations permitted to SP active packets. 

SP are encapsulated within ANEP via the definition of a specific SP header {basic 
authenticator), shown in Figure 2. The authenticator permits both to identify the 
origin of the packet and to verify the integrity of its non-mutable portions, by 
exploiting public-key algorithms. SP designers extended the ANEP header to 
accommodate the hop-by-hop mode, which models the case of intermediate SP nodes 
that operate and transform the active packet contents. To avoid the need of an integrity 
check at any hop, SP carry an authenticator that omits the payload and the packet 
length field in the ANEP header. 
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Fig. 2. The SmartPacket format 



At the reception of one SP packet, its origin and the integrity of its non-mutable parts 
are verified. Any SP node checks the received authenticators on the basis of user 
certificates, either included in the SP payload or requested to an external application- 
layer PKl. There is no limit to the number of certificates that can be included within 
an SP packet, obviously apart from the maximum packet size. X.509 standard 
certificates directly enclosed into SP packets reduce the space left for the code, but 
may significantly improve the efficiency of security controls. The SP performance, in 
fact, is tightly connected to the local availability of needed certificates at the 
intermediate SP nodes. 

If the packet origin and integrity verification process fails, the packet is discarded; 
otherwise, the packet enters the authorization process that employs ACL mechanisms 
to control active packet execution. 



3.2 Security Solutions at the Application Layer 

Most PN proposals implement security solutions completely at the application level, 
without affecting the content of the transmitted active packets. Main motivations are 
the integration with mechanisms, tools and infrastructures already developed for 
securing distributed services, the fast prototyping of security solutions via software 
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simulations of network components, and the simplified support for flexible and 
programmable security in PN. 

All these PN proposals have chosen Java as the implementation teehnology. This 
choice is motivated not only by Java portability over heterogeneous platforms, but 
also by its security-related properties, such as safeness (strong typing, lack of pointers, 
automatic memory management) and availability of seeurity mechanisms both at the 
language level and at the run-time support one [14]. 

For instance, the Intel framework for PN (Phoenix) exploits Java authorization and 
access control to rule the access to active node resources by mobile agents that 
implement congestion analysis and intrusion detection active protocols [15]. The Java 
authorization mechanisms are extended with proprietary monitoring and management 
functions that permit to ehange agent priority levels and to dynamically reject resource 
requests depending on current resource load. Another example is the Lucent PN 
prototype for distributed network management where legacy routers are enhanced with 
a Java-based active engine that runs on a general-purpose workstation [16]. The 
Lucent system exploits the standard Java SecurityManager to avoid possible 
interference between different flows of active packets and to control the associated 
session environments at run-time, by preventing the access to native methods and to 
some protected parts of the file system. 

We give in the following some details of the ABSANl architecture because it is an 
MA-based PN system specifically developed with the goal of providing a flexible, 
open and interoperable security framework. ABSANl is Java-based, and its 
developers are skeptical about the introduction of dedicated programming languages 
for the PNarea [2] [3] [13]. 

ABSANl completely isolates the execution of injected agents into abstraction 
localities called places to prevent any interference among executing mobile agents. In 
particular, several isolated places can be concurrently present on the same ABSANl 
node. A resource manager component acts as a mediator in the interactions between 
agents and node resources. The resource manager can also provide the basic 
mechanism for auditing: it can collect the data generated by network activities to 
identify the users responsible for security breakouts. In addition, it provides control 
and management operations to change the overall system behavior, e.g., the 
modification of local security policies is only allowed from a dedicated place 
responsible for node management. 

Agent authentication is based on credentials that permit the association of agents 
with responsible principals and the control of agent actions according to the local 
security policies. Credentials can vary to include standard X.509 and SPKI 
certificates, the hash of packet contents, the list of its signers and their signatures, etc. 
The granted permissions result from the intersection of two policies, at the place and 
at the node level. Policies can be administered via application-layer management 
tools; abstractions such as groups and roles for principals can further enhance policy 
manageability [3]. 
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3.3 Comparing the Approaches: Security at which Layer? 

The above discussion has shown that various frameworks can address security at 
different levels of abstraction, by exploiting features typical of either the network 
layer or the application one. The awareness of the advantages deriving from the 
security approaches at different layers favors a more knowledgeable choice in the 
trade-off between security requirements and expected performance. The correct choice 
impacts crucially on the acceptance of deployed PN protocols and can widen the range 
of application areas of PN secure services. 

Other areas have already faced a similar debate about security provision at 
different layers. The request for Web secure services has motivated the introduction of 
application-layer secure protocols, such as Secure HTTP and SSL. Their extensive 
usage has stressed the need for more efficient solutions that can be provided by 
working at the network layer as in the IPsec proposal. However, the discussion about 
at which layer security should be provided is still open [6]. 

The main advantage of network-layer solutions for PN is efficiency in exchanging 
authenticated and encrypted packets. The encapsulation of security information within 
packet headers permits to perform security checks at the network layer by saving 
packet security processing to upper layer protocols. However, some security issues 
can only be dealt with at higher levels of abstraction, e.g., the management of 
authentication and authorization services. For instance, SANE can achieve the 
performance typical of network-layer solutions, after the security associations have 
been established between all nodes in the active packet path; but this preliminary 
negotiation phase works at the application layer. 

The application-layer approach simplifies the support to system durability because 
solutions at this level can provide flexibility, extensibility and dynamicity to the 
management of security services. For example, a policy/role management service 
demands solutions to simplify administrator operations of adding and changing 
policies/roles. The embedding of this functionality directly in network-layer packets 
could imply continuous extension of PN protocols and formats, to accommodate 
evolving requirements and facilities, and this would clash with the need of keeping the 
packet size to the minimum. In addition, application-layer solutions simplify the 
implementation of an open and interoperable security architecture, capable of 
integrating with diffused standard security frameworks that exploit state-of-the-art 
technologies, as shown in [3]. 

The above considerations motivate the design of security frameworks that integrate 
the two layers, by taking advantage of both the efficiency deriving from embedded 
protocols at the network layer and the expressive capacity stemming from solutions 
and tools at the application layer. PN administrators and users can exploit the 
frameworks to find their specific balance between performance and flexibility, 
depending on particular service requirements and the level of trust of the target 
environment of operations. 
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4 The Programmable Network Component 

We have developed a framework for the fast prototyping and deployment of protocols 
and services that is based on a Programmable Network Component (PNC) to be 
installed in the nodes of the network infrastructure. The PNC supports active protocols 
and services expressed in terms of mobile agents that employ the migration and 
communication services of the SOMA programming environment [7]. The PNC is 
built on top of the JVM to exploit the Java inherent support for dynamic class-loading, 
platform independence and security. 

Mobile agents are used to distribute the behavior of active nodes out-of-band and to 
support the dynamic extension of active node functions [2]. In addition, MAs permit 
the easy installation of service- and user-specific protocols that can be injected 
dynamically into the network. Our PNC provides a secure environment for agent- 
based active protocol execution, with a wide range of security solutions at different 
layers. The main guideline is to combine the efficiency of basic security features 
implemented at the network layer together with the flexibility and extensibility of 
more advanced security tools and infrastructures provided at the application one. 

The PNC is designed to support differentiated protocols that can coexist in the 
same node without reciprocal interference. For this purpose, the PNC provides 
isolated environments for agent execution called places (see Figure 3). A component 
called dispatcher is present in any PNC node to forward incoming packets to the agent 
responsible of their handling depending on the specific security and management 
policies of the PNC node. The PNC support ensures a protected binding between 
loaded agents and local node resources. The binding is implemented via a proxy-based 
mechanism where each node resource is encapsulated and available via a proxy 
object. Agents refer initially only to these proxies with no possibility to access 
resources directly. In particular, any resource proxy exports a Resource interface 
with the getEnvironment ( ) method that agents have to call to access the managed 
resources. The proxy accepts requests for its resources and determines whether to 
allow the agent access on the basis of the node security policy. For instance, returned 
references can depend on the role dynamically associated with the agent principal. To 
improve efficiency, agents are forced to pass via the proxy only once at first retrieval 
of resource references, whereas afterwards they can maintain these references locally. 
Any PNC node takes advantage of a set of basic security services that include: 

• the secure transport service that provides integrity and secrecy for the transport of 
agents between PNC nodes. At agent arrival at any PNC node, security checks are 
performed to ascertain if integrity and secrecy have been preserved during agent 
transport; 

• the authentication service that accepts/discards agents on the basis of their 
corresponding user identities and roles. Cryptographic operations are performed to 
verify the X.509 identity and role certificates, possibly locally to the PNC. If the 
verification succeeds, agents can be dispatched to the correct place, otherwise 
forwarded to a severely restricted default environment that support anonymous 
agent execution; 
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• the safe checking service that exploits the Java class verifier to ensure agent class 
file conformance to the JVM specification. Static checks avoid stack 
over/underflow, and dynamic controls are provided to grant correctness of 
symbolic references. Agents not satisfying the safety property are discarded; 

• the authorization service that extends the Java security architecture to permit the 
utilization of a role-based access control model. Security policies rule the access of 
agents to all local PNC resources, both shared and private ones, that are available 
in the execution place. Authorization checks are performed by resource proxies 
when the getEnvironment ( ) method is called. The access control policies define 
the set of permitted references for the requesting agents. 

It is worth noting that some security checks, such as the integrity, secrecy, and 
authentication ones, can be implemented at the network-layer to improve efficiency. 
However, even these security services require to integrate with application-layer 
solutions in order to be exploited in large scale networks. 
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Fig. 3. PNC isolated environments for agent execution 




4.1 Network-Layer Solutions 

We have designed the PNC security architecture to achieve the needed degree of 
extensibility to permit the addition of new security features without modifying or 
recompiling existing security components. To this purpose, the PNC framework 
includes several modules that provide similar security services, but with different 
properties in terms of flexibility and performance. This permits to configure and 
install the most proper solution depending on application-specific requirements. 

The modularity of the approach applies to the implementation of the authentication 
and the secure transport services, which are provided by either the ANEP module or 
the IPsec one (see Figure 4). The ANEP-compliant active packets exploit the TypelD 
and Option fields to indicate respectively the identifier of the involved MA-based 
protocol and the authenticator data, in the same way as in SANE. By now, there are no 
hardware implementation of ANEP-compliant routers and the possible performance 
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improvements cannot apply to real service protocols. We are also completing the 
implementation of the alternative IPsec module that adopts the IPsec network-layer 
protocol to provide secure transport and authentication services. We are currently 
working on the IPsec module implementation on a dedicated IPSec-compliant 
hardware component, the TimeStep VPN Gateway [17]. 

Both the ANEP module and the IPsec one can be configured to use standard public 
key cryptography mechanisms and X.509 certificates that can be distributed, revoked 
and suspended by an external application-layer PKl [9]. The integration of both 
modules with a PKI can further simplify the modularity and interchangeability of the 
implementations. 



4.2 Application-Layer Solutions 

Advanced application-layer security services are implemented on top of the basic 
security services to improve the manageability, scalability, flexibility, and dynamicity 
of the basic security services (see Figure 4). 

The certificate management service is used to enhance the manageability and 
scalability of the secure transport and authentication services by supporting 
keys/certificates distribution, revocation and suspension. The service is offered by the 
Entrust PKI [18] that permits to provide transparent and automatic key management in 
application-specific components written in different programming languages, e.g. 
Java. The certificate service is implemented to realize a local cache of most recently 
used X.509 certificates and certificate revocation lists at any PNC node to improve the 
efficiency of integrity, secrecy and authentication checks. When security operations 
require certificates that are not present in the local cache, the needed certificates are 
requested to the Entrust PKI together with their corresponding revocation/suspension 
status. It is worth noting that in a realistic scenario different PNC administrators may 
wish to adopt different PKI solutions depending on their peculiar management and 
security policies. For this reason, we are also examining the interoperability issues that 
stem from the integration of our PNC with different and heterogeneous PKIs. In 
addition, all the basic security services can benefit from the policy/role management 
service. This service increases the usability of access control policies when dealing 
with a large-scale PNC network infrastructure that provides services to a potentially 
large number of users. The service adopts the Ponder policy language [19] to model 
the actions that agents are permitted/forbidden to perform on the PNC node. 

In addition, it provides the required support to map Ponder policy specifications 
into platform-dependent policies that can be interpreted and enforced at run-time in 
the system. In particular, the service includes a policy/role graphical user interface for 
the specification, editing, and administration of policies/roles and a policy repository, 
local to the PNC node, for the storage and retrieval of policy/role information. The 
policy/role management service is designed to support dynamic roles/policies 
modifications with no need to suspend PNC operations. Administrators can modify the 
security policies of the managed resources and the changes are propagated 
automatically to involved PNC nodes, and consequently to the resource proxies. 
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Fig. 4. The PNC architecture of security services 

The PNC node also provides an on-line monitoring service that permits system 
managers to control and prevent any agent excess in resource consumption, by making 
available the usage of PNC local resources. The monitoring service can be configured 
to visualize the utilization of the local processor, the quantity of used memory and the 
generated network traffic, both for any Java thread and any other process outside the 
Java Virtual Machine. To reduce the overhead effect of on-line monitoring on PNC 
performance, our monitoring service can be dynamically tuned to observe only a 
subset of executing threads, possibly with different observation frequencies. For 
instance, to face denial-of-service attacks, we collect the CPU utilization percentage 
only for the agent threads responsible of active packet execution; when one thread 
exceeds a threshold, the PNC alerts the system administrator and begins to collect and 
visualize all possible monitoring information about the specified thread, with a 
possibly increased tfequency. 

The collected monitoring information is obtained in two different ways. On the one 
hand, we exploit platform-dependent functionality (Solaris/Linux /proc directory, 
Microsoft WindowsNT system registries), integrated in the PNC via the Java Native 
Interface [20], On the other hand, to permit fine-grained monitoring visibility of all 
Java threads, we use the novel Java Virtual Machine Profiler Interface. The JVMPI is 
proposed by Sun within the latest version of the Java platform, to notify Java 
applications of several kinds of events that can take place in the virtual machine [21]. 
The result is a common monitoring API that abstracts from the PNC hosting platform 
(Solaris, WindowsNT and Linux are currently supported) and that is mapped 
transparently to the correct platform-dependent dynamic libraries at run-time. 



5 Final Remarks 

In the global environment proposed by the Internet, many application areas have 
experienced an exponential growth in the number of interested developers and users. 
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There are several services and protocols that could add new impulse to this scenario, 
but their deployment is currently limited due to the long and difficult standardization 
process. The application of PN technologies to the Internet infrastructure could boost 
even more its importance, because PN could accelerate the deployment of new 
service-specific protocols that can be installed at run-time. 

However, the PN potential has not been exploited completely because of the lack of 
general agreement on comprehensive and accepted security frameworks. Only the 
definition of general security standards, or, at least, of more precise security 
recommendations, can produce the momentum needed to grant durability to the PN 
design efforts. 

The paper has considered how several PN proposals have faced the issues 
connected with security. The paper does not give a complete classification but should 
help security service designers in better understanding the properties offered at 
different layers. The aim is to drive the design of a PN security framework offering a 
wide range of solutions and tools to compose the contrasting requirements of 
flexibility and efficiency. 

As a final consideration, PN emphasize programmability for network components 
but also call for programmability of the security framework itself, to fully adapt to 
different environments, to diverse user expectations, and to various requirements in 
performance. 
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Abstract. Any security architecture for a wide area network system 
spanning multiple administrative domains will require support for policy 
delegation and certificate distribution across the network. Practical solu- 
tions will support local autonomy requirements of participating domains 
by allowing local policies to vary but imposing restrictions to ensure over- 
all coherence of the system. This paper describes the design of a such 
a system to control access to experiments on the ABone active network 
testbed. This is done through a special-purpose language extending the 
Query Certificate Manager (QCM) system to include protocols for secure 
mirroring. Our approach allows significant local autonomy while ensuring 
global security of the system by integrating verification with retrieval. 
This enables transparent support for a variety of certificate distribution 
protocols. We analyze requirements of the ABONE application, describe 
the design of a security infrastructure for it, and discuss steps toward 
implementation, testing and deployment of the system. 

Keywords: Security policy, certificate distribution, local autonomy, ac- 
cess control, ABone, active networks, QCM, Query Certificate Manager. 



1 Introduction 

Active network systems will require practical approaches for managing access 
control information securely and conveniently on a wide area network. The tech- 
nology for doing this will need to reach beyond the current state of the art for 
policy description techniques and certificate distribution. A good test case for 
understanding requirements and possible solutions for the problem is the man- 
agement of the access control mechanism for experiments on the ABone [2], an 
emerging testbed for research in active networks. Requirements and solutions 
for the ABone also have relevance for many other wide area network systems 
beyond active networking. 

The aim of this paper is to discuss the requirements and design of an access 
control infrastructure suited to wide area systems like the ABone. The princi- 
pal focus is on the concept of local autonomy, wherein nodes within multiple 
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administrative domains are allowed to define their own policies. This capability 
has two primary policy aspects: authorization policy and certificate distribution 
policy. There have been a number of recent proposals about how to express 
authorization policies in large-scale distributed systems [4, 16, 5, 3]. Work on 
certificate distribution has focused on the design of directory systems such as 
DNS/DNSSEC [19, 17, 1], the ISO Directory [13], and the Lightweight Directory 
Access Protocol (LDAP) [22, 12]. Such directories are used to hold certificates 
(digitally signed documents) providing information on which authorization poli- 
cies are based. Other relevant work [14, 11] focuses on the formats of certificates 
and protocols like chaining of certificate authorities and revoking certificates. 
A key challenge for the use of such systems by wide area network applications 
is reconciling the demands of local autonomy with the functionality of the dis- 
tributed system as a whole. Local autonomy is needed in the ABone and many 
other wide area distributed systems because different domains have needs and 
goals that may be in conflict. For scalability, the policy of one domain may have 
to rely on the policy of another domain. But a domain should be free to define 
its policy by taking what it wants from another domain’s policy and discarding 
what is not appropriate. However, it is possible that such variations lead to a 
situation in which no domain is able to maintain the policy it requires, given its 
reliance on other domains with different policies. 

Our own work on security infrastructure has focused on ideas for integrat- 
ing verification and certificate retrieval using a technique called policy-directed 
certificate retrieval [7]. The basic idea is that the verifier is in the best posi- 
tion to determine what certificates are required, so it can be used effectively for 
the retrieval of certificates. Our implementation of this idea is a system called 
Query Certificate Manager (QCM), which enables a verifier to express policies 
for distributing certificates and retrieving them automatically as part of verifi- 
cation. The aim of QCM is to accomodate significant flexibility for both access 
and retrieval policies while ensuring consistent global security and tractable dis- 
tributed computation. We therefore explore the idea of using policy-directed 
certificate retrieval as realized by QCM in maintaining the ABone access control 
infrastructure. 

The ABone is under design currently (see [2] for a description of objectives 
and approach), so it requires both short and long term solutions. A short-term 
solution must support a modest number of participating nodes with an approach 
that can be implemented almost immediately with little impact on existing soft- 
ware for the ABone, namely the various Execution Environments (EE’s) for 
evaluating active packets and the Active NETwork (ANET) [20, 21] system for 
installing EE’s. A long-term solution will need to provide support for an ex- 
panding collection of nodes and more complex access policies and distribution 
strategies. 

We begin our analysis in the second section of this paper with a discussion 
of the simple authorization architecture initially used by ANET. We explain 
why this approach can be improved and present an architecture and language 
for our approach. In the third section we describe secure mirroring protocols. 
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which can be used to provide a simple ACL managment system with modest 
local autonomy. In the fourth section we describe more advanced features based 
on policy-directed certificate retrieval as applied to the ABone and show how 
these features provide significant local autonomy. A final section summarizes 
conclusions. 



2 ABONE Requirements and Proposed Infrastructure 

The ABone is a collection of computers being used to run active networking 
experiments over the Internet. An active network system consists of one or more 
Active Applications (AA’s) running on top of an EE that defines the seman- 
tics of code contained in active packets. ANET is the system for installing 
EE’s on ABone nodes, which are mainly Unix hosts currently, but could be 
special-purpose active routers. ANET provides support for a server process called 
ANETD, the ANET Daemon, that responds to requests to run ABone experi- 
ments on the node where the daemon is running; ANET also provides a client 
that allows users to request and configure experiments on the ABone testbed 
by contacting ANETD servers. In the current version of ANETD, permission 
to carry out such an experiment is determined by the nodes on which the ex- 
periment will be run by consulting an Access Control List (ACL) consisting of 
public keys of principals permitted to perform experiments. In a typical scenario, 
a claimant client approaches a verifier server running ANETD with a certificate 
requesting access to the server for the purpose of conducting an experiment. 
Each user of the ABone generates their own 512-bit RSA key pair and regis- 
ters the public key with a master server, currently operated at SRI. The master 
server at SRI maintains a master ACL listing the public key and host of every 
ABone user. An example is given in Eig. 1. When a claimant makes a request, it 



alice . com 
bob . com 
careless . org 



r8K+gZ4ZRo5usA675 . . . 
udoN0w7B0K65hhwpw . . . 
umVy3uvlLpaSx7W83 . . . 



Fig. 1. An ABone access file, with keys abbreviated 



is signed with his private key, and the verifier checks this against public keys of 
permitted users. In the early versions of ANET, the first time ANETD ran on 
an ABone node, it queried the master server for the current ACL of permitted 
users. Once acquired, maintenance of this list was left to the administrator of 
the ABone node. Each local administrator was free to modify their copy of the 
list by adding or deleting users. This provided every ABone node autonomy over 
its own access policy. 

This approach is sufficient for a few nodes if administrators are diligent about 
maintaining their ACLs, and the number of nodes is not changing much. How- 
ever, it is hoped that the ABone will grow to more than a thousand nodes within 



280 Pankaj Kakkar et al. 



a few years, and, for several reasons, this strategy will not scale. First, when a 
new user joins the ABone, their key is posted to the ACL of the master server, 
but there is no mechanism for propagating the new key to machines already run- 
ning ANETD. If the user needs to run an experiment on these machines, their 
administrators will probably need to be contacted individually. Second, each site 
administrator maintains their copy of the list by hand. So, even if the hosts and 
keys of new users were distributed from the master server automatically, the 
administrators would have to process them by hand. Also, local policies are not 
written down anywhere; they just make their effects known in the local copy 
of the list. For example, if an administrator believes that careless.org has 
been infiltrated, he can delete every key associated with careless . org from his 
copy of the list, but there is no record to tell him not to put new keys from 
careless . org onto the list. 

There are at least four basic strategies for dealing with these problems using 
certificates (signed documents) based on the following fundamental tradeoffs: 

1. Whether the claimant or the verifier is responsible for obtaining certiheates, 

and 

2. Whether the certiheates are long-term or short-term. 

For instance, one idea is to allow the master server to issue certiheates to permit- 
ted users asserting their right to do experiments. The opposing idea is to provide 
a means for ANETD servers to consult the master server about requests to set 
up experiments before or as the requests arrive. Both cases break down into 
signihcantly different solutions depending on whether certiheates are long-term 
or short-term. Consider hrst the case in which the claimant proves permission 
by providing a certiheate. In the short-term case, the master server could issue a 
certiheate to a principal for a period just long enough to set up an experiment. 
This has the disadvantage of requiring the server to be consulted many times 
by the same principal if many experiments must be conducted. If a long-term 
certiheate is issued instead, the need for repeated requests for new certiheates 
will be reduced. However, this opens the possibility that if the principal loses 
privileges or suffers a compromise of his private key, then some system may 
be needed to revoke the certiheate. Now consider the case in which the veriher 
proves permission based on a signed request from a claimant. The tradeoff be- 
tween long-term and short-term certiheates remain the same, but in this case 
the veriher can check a local ACL for information about the claimant and act ac- 
cordingly, without expecting the claimant to supply any additional certiheates. 
An advantage of this approach is that it need not place any new responsibilities 
on either the ANET client or server. A claimant does not need to obtain or 
maintain the freshness of any certiheates to make requests, and the veriher need 
not know how its ACL is being kept up-to-date. 

Given these considerations, the use of veriher-gathered certiheates provides 
a simpler and more modular approach to improving ABone ACL maintenance. 
Given a design goal for a short-term solution that entails no changes in the 
behavior of ANETD, the best solution is to mirror the ACL of the master server 
at each server location. We consider two protocols for secure mirroring based on 
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an online signature from the master server and the ability of ANETD servers to 
establish local policies about freshness. Note that ANETD servers are clients of 
the master ANET server, so they are the clients in the following protocols: 

Client Pull The client periodically requests a fresh copy of the ACL by sending 
a hash of its current copy. The server checks to see if the master ACL has 
this hash. It sends a fresh signed copy if it does not, otherwise it sends a 
notification that the client ACL is still up-to-date. 

Server Push The server accepts requests from clients to register for updates to 
the master ACL and supplies the master ACL upon registration. Whenever 
the master ACL changes, the new ACL is signed and distributed to registered 
clients. Since clients may become unreachable, the server times out entries 
in its register so clients must periodically re-register themselves. 

Details and comparative discussion of these protocols will be provided in 
the next section. The mirroring protocols provide a very modest degree of local 
autonomy to ABone nodes. A node may choose how frequently it wishes to 
update its ACL, but will not be able to customize the contents of that ACL. 
Moreover, claimants are unable to provide credentials certifying their rights, so 
they will need to rely on the freshness of ACLs at verifiers. If a node uses a client 
pull with a low frequency of update then a claimant may be unable to obtain 
access for a substantial period. 

Although it is a substantial improvement over manual maintenance of ACLs, 
local autonomy over mirroring the local ACL is a somewhat weak degree of 
authority. To go beyond this, it is desirable to think in terms of a different 
architecture where ANETD and QCMD communicate in a more sophisticated 
manner. A possible architecture is illustrated in Fig. 2. In this version of ABone 




Fig. 2. QCMD Communication 



security, a QCM daemon, QCMD, runs on each ABone node along with ANETD. 
QCMD is responsible for maintaining the policies of the node and for certificate 
distribution. ANETD addresses all policy questions directly to QCMD, instead 
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of looking at a local copy of the ABone user list. When QCMD needs to per- 
form certificate distribntion, it exchanges messages with QCM daemons on other 
nodes. 

With this architecture, the ABone is able to take advantage of the mirroring 
protocols as well as the following QCM protocols: 

Certificate Push Certificates are accepted from a claimant and used to avoid 
reference to remote policies. 

Online Query/Response A QCM policy may refer to a policy at a remote 
principal. If a request is made for a certificate from the remote principal this 
is retrieved with appropriate signatures. The certificate is created dynami- 
cally using an online key. 

Verify Only Verification is based only on local policies and certificates pushed 
by the claimant. 

Offline Query/Response A collection of certificates may be created in ad- 
vance with offline signing. A request for a certificate is answered with all 
relevant certificates thus created and the recipient constructs the necessary 
response from these. 

The main point here is that all six of these retrieval and verification mechanisms 
can be made to work coherently together with each claimant and verifier choosing 
its own strategy. These protocols are further illustrated in Section 4. 

Our overall proposal for the ABone security infrastructure is to provide a 
special-purpose langauge supporting six protocols to enable local autonomy with 
policy-directed certificate retrieval. A grammar for the language is provided in 
Table 1. This is simplified from what would be required for the actual system. 
For instance there is a need for wrappers to expose underlying data sources. 



3 Mirroring Protocols 

Mirroring is a common strategy for certificate distribution. Under mirroring, the 
master ACL is kept at the master server, but all the other ABone nodes have a 
copy, and changes to the master ACL are propagated to the copies. Both push 
and pull protocols ensure a weak consistency between the master policy and 
the mirrors: the mirrors may be out of date with respect to the master, but 
changes are guaranteed to propagate within a specified time window based on 
the reliability of the connection between the master server and its mirrors. 



3.1 Characteristics of the Protocols 

In the ABone implementation, the access control lists are treated as public in- 
formation, so we are not concerned about maintaining the confidentiality of the 
lists. However we are concerned about integrity; we want the copy of the access 
control list to be an accurate copy. By ‘accurate’ we mean the copy of the ACL 
is the same as a recent version of the central access control list. There are two 
ways integrity can be violated: 
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Table 1. QCM With Mirroring 



d : 


:= X = e 


definitions 




1 X = import (data) 


import 




1 xi = pullclient(c$a;2, ti, t2) 


pull client 




1 pullserver(a;) 


pull server 




1 XI = pushclient(c$a;2, ti, t2) push client 




1 pushserver(a;, t) 


push server 


e : 


:= c 


constants 




1 i 


time periods 




X 


local names 




1 (eSa:;) 


global names 




1 (ei,...,e„) 


products 




1 {ei,...,e„} 


sets 




1 Ue 


set union 




1 {e 1 5i,...,ff„} 


comprehensions 


9 ■ 


:= (ei = 62) 






1 (ei A 62) 


guards 




1 (pee) 


generators 


P ■ 


■ =x\c \ (pi,...,p„) 


patterns 



1. The mirrored list is corrupted — for example, it contains entries that were 
never in the master ACL. 

2. The mirrored list is out of date — for example, the ACL contains an entry 
that was valid at one point but is invalid now. 

The mirroring protocols use digital signatures to ensure integrity. The messages 
are not encrypted, since the data is not considered confidential. We are less 
concerned about availability since we expect the ACL to change comparatively 
slowly. Also, the characteristics of the server and network mean that some kinds 
of denial of service attacks are very difficult to stop. On the other hand, we do 
not want the protocol to make it easier for an adversary to mount a denial of 
service attack. To aid this objective, both the mirror protocols use timestamps to 
frustrate certain kinds of attacks. Since clocks will not be perfectly synchronized, 
we use a freshness threshold time period /. Timestamps are considered fresh if 
are within plus or minus / time units of local time. 

3.2 Pull Mirror Protocol 

The pull protocol puts the onus on the client to make sure it has an up-to-date 
copy of the mirrored data. The protocol is parameterized by two time descriptors: 
the period for sending update requests to the server, CRequestP, and the amount 
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of time that the client waits for a valid server response before it resets its ACL, 
CResetTO. The parameters are passed in the declaration on the client C: 

X = pullclient(A' 5 $?/, CRequestP, CResetTO). 

The server S must declare its willingness to engage in the protocol for this data: 

pullserver(y). 

We use the notation Sk{M) for the triple M, K, a, where M is a message, K is 
a public key, and a is the hash of M signed by the private key corresponding 
to K . The protocol works as follows: 

1. The client periodically (with a period of CRequestP) checks with the server 
to be sure that its copy of the data is up-to-date. It does so by sending a 
Changed? message to the server. The Changed? message contains the name y 
of the policy being mirrored (e.g., ACL), the hash h of the client’s version of 
the data, a timestamp tc and the server’s public key Ks- 

C —> S : Sc{ChaiLged?{y, h,tc, Ks)) 

The client keeps a record of the timestamp tc , which is used to synchronize 
with the server’s responses. 

2. On getting such a request from a client, the server first checks the signature 
on the message, checks the freshness of the request using the timestamp in 
the message, and verifies that the message was meant for it (using Ks). If 
the hash of the data at the server is different from the hash h sent by the 
client, then the client is out-of-date, so the server sends the new data v to the 
client, including the timestamp sent by the client, a new timestamp based 
on its current time, and the client’s public key: 

S ^ C : S'5(NewVersion(y,w,tc,i5, ATc)) 

If the hashes are the same, then the client must have an up-to-date version 
of the data, so the server responds by sending a NoChange message: 

S ^ C : Ssi^oChaiLge{y,tc,ts, Kc)) 

3. When the server’s response is received by the client, it checks the signature 
and freshness as before, then verifies that the message was meant for itself, 
and then confirms that the tc in the server’s response matches the value it 
remembered. If these checks succeed, and the message received by the client 
contains a new version of the policy being mirrored, then the client updates 
its policy. Otherwise the policy is left unchanged. 

4. If CResetTO time passes since the last response was received from the server 
(responses could be lost due to network congestion or a malicious agent in 
the network) the client resets the local copy of the policy to the null set. 
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The protocol preserves data integrity: because the reply from the server is 
signed, the client can be sure that the data was not tampered with, and, be- 
cause it contains the timestamp, the client knows it is fresh. Moreover, we can 
guarantee that anybody who is in the local copy of the policy was in the server’s 
master copy at some point in time. The worst damage an attacker could cause 
is to deny everyone access to a client by blocking traffic between the client and 
server for a sufficiently long time, thus causing the client’s copy to reset. We 
believe this is better than allowing access to someone who is no longer in the 
server’s policy, which could happen with a stale local copy. 

The timestamps are essential to the security of the protocol and are intended 
to reflect recommendations in the ISO/IEC standard for entity authentication 
using digital signatures [15]. Consider an alternate version of this protocol in 
which we eliminate the timestamps. Of course, this would allow a replay attack; 
an adversary could save old messages and then send them later to confuse clients. 
This weaker protocol may also enable a kind of denial of service attack we call 
traffic amplification. Adversaries could exploit the protocol to effectively amplify 
the amount of junk traffic that they can generate. 

If an adversary £ wanted to overload the network connection of C, then £ 
could simply send junk packets to C. But f’s ability to overwhelm C is limited 
by the bandwidth of £’s network connection. The traffic amplification problem 
could occur if the adversary £ saved a Changed? message from C. £ could then 
send the message over and over again to the server 5. If 5 responded to each 
message by sending a large file to C then the resulting traffic may clog C’s network 
connection. Thus S amplified £’s ability to clog C’s bandwidth. 

The timestamps allow the server and clients to ignore messages which are not 
fresh. An adversary can only clog C’s connection while the Changed? message 
is fresh. If the server keeps track of which timestamps it has seen then the 
attack can be prevented entirely; the server can discard messages that contain 
timestamps that the server has already seen. This does not put much burden on 
the server because timestamps only have to be saved until they become stale. 

3.3 Push Mirror Protocol 

The push approach puts the onus on the server to make sure that changes to the 
data are propagated to the clients. The protocol is parameterized by three time 
descriptors: a client side re-registration period CRegisterP, a server side registry 
flush period SRegisterTO and a client side policy reset timeout CResetTO. Two 
parameters are passed in the declaration on the client: 

X = pushclient(AT5$y, CRegisterP, CResetTO). 

The remaining parameter is supplied by the server declaration: 

pushserver(?/, SRegisterTO). 



Here is the protocol: 
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1. The client tells the server that it wants to receive updates for the policy y. 
It sends to the server the name of the policy, a hash of the current local copy 
of the policy, a timestamp, and the server’s public key. 

C ^ S : Sc{^eg±stei:V[e{y,h,tc,Ks)) 

2. The server performs checks for the signature and freshness and whether y is 
available as a push server. It adds the client to its table of registered clients if 
these checks succeed. It sends an immediate NoChange or NewVersion mes- 
sage as in the previous protocol. After that, whenever the server’s policy y 
changes, the server sends updates through Update messages to registered 
clients: 

S —>^C : S' 5 (Update(y, v, ts, Ks)) 

3. Whenever the client receives an Update message, it checks the signature, 
freshness, and origin. If these succeed, it updates the local copy of y to v. 

4. After SRegisterTO time passes the server removes the client from the table 
of registered clients. The server will not send any more updates to the client 
until it receives a new RegisterMe message. The client will re-register by 
sending a new RegisterMe every CRegisterP time. 

5. As in the pull protocol, if an update from the server is not received for 
CResetTO time, the client resets its local copy of the policy to the null set. 

3.4 Which Strategy: Push or Pull? 

The protocols have different advantages that depend on the kind of data that is 
mirrored and the network capabilities of the client and server. If the data changes 
infrequently then the push protocol may be more appropriate since messages 
will only be sent when the data actually changes. The push protocol also allows 
faster propagation of changes since the change can be passed on to the client 
immediately. The pull protocol allows the client to control its interaction with 
the server. If a client does not want frequent updates or can only connect with 
the server at certain times (at midnight, for example) then the pull protocol 
would be more appropriate. 

We have left open the question of whether the protocols are implemented 
reliably or unreliably and, in the case of the push protocol, whether updates 
from changes to the data are sent to registered clients by unicast or multicast. 
The experiments we describe next used unreliable unicast (UDP), and we have 
implemented a reliable unicast (TCP) version of the push protocol. 



3.5 Experiments 

Since mirroring is intended to provide short-term support for ABONE security, 
we wanted to know about how many nodes could be supported by these pro- 
tocols. Test data is unavailable currently. Various attempts have been made to 
simulate certificate retrieval, such as using DNS resolvers as a source of traffic 
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information [18], but we are not convinced that this is worth the trouble for 
us, given the likely differences between DXS, which has a retrieval mechanism 
based on referrals and on-demand caching, and the system we propose, which 
uses mirroring. Moreover, it is somewhat questionable whether access control in- 
formation for the ABone has a traffic profile at all similar to resolution of domain 
name bindings. Hence we have used a straight-forward stress model that assumes 
extremely frequent registrations of ABone users. We measure the failure rate of 
the protocols under this stress. Our experiments were conducted on a cluster of 
5 dual Pentium-II machines running Linux and an UltraSparc machine running 
SunOS. We ran an ANETD/QCMD server on the UltraSparc, and 500 client 
ANETD/QCMD’s on the cluster, 100 per machine. We conducted the following 
experiments: 

1. In the first experiment, we started all clients on the cluster almost simul- 
taneously. All 500 clients executed the pull protocol, with a 60 second gap 
between successive requests for the ACL. As a result, the server had to deal 
with very intense but short bursts of requests. We found that on the average, 
about 70% of the requests sent by the clients were dropped by the server. 

2. In a second experiment, we staggered the startup of the clients so that the 
demand on the server was more constant. Again, all clients executed the pull 
protocol. We found in this case that the server could handle all 500 clients 
well, and that no requests were being dropped. 

3. Einally, we tried a mixture of push and pull clients. 250 of the clients used 
the push protocol, while the rest used the pull protocol as before. Again, the 
server could handle all requests sent to it, and could get updates out to the 
push clients on time. 

The experiments suggest that mirroring will scale well beyond 500 nodes. In 
practice, we expect the master access list to change very slowly, so a 60 second 
delay in updates is overkill. Mirroring will be able to handle the projected growth 
of the ABone for the immediate future. However, since the system is currently 
being deployed, we will have the ability to conduct more direct tests of this claim 
in the future. 



4 Distribution Beyond Mirroring 

Let us now discuss other protocols required to provide more substantive support 
for local autonomy of ABone nodes. We describe how policy-based certificate 
retrieval can be achieved for the ABone by describing the QCM system protocols 
with sample ABone policies as examples. The basic functionality provided by 
QCM is to securely evaluate a policy to a table. If the policy is defined in terms of 
remote policies, then QCMD is responsible for securely obtaining those policies. 
Once QCMD has produced the table, ANETD can use it to decide whether to 
authorize requests. The grammar for the language is provided in Table 1. To keep 
things simple we have omitted the lifetimes for certificates; our implementation 
provides and checks expirations to prevent replay attacks and accidental use of 
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old data. More details on the QCM implementation can be found in [7], and a 
formal semantics is provided in [6]. 

4.1 Strategies for Supporting Local Autonomy 

Security policies are often given in the form of a table. For example, access ma- 
trices, public key directories, and access control lists can all be thought of as 
tables, as can the ABone policy given in Fig. 1. QCM was designed to sup- 
port table-based security: in QCM every policy defines a table. The syntax 
for table definition is illustrated in Fig. 3, which gives the QCM equivalent 
of the ABone policy of Fig. 1. For backwards compatibility, we can also pro- 



{ ("alice.com", 
("bob . com" , 
("careless . org" , 



Principal (RSA-MD5("r8K+gZ4ZRo5usA675. .."))), 
Principal (RSA-MD5("udoN0w7B0K65hhwpw. .."))), 
Principal (RSA-MD5("umVy3uvlLpaSx7W83. .."))) > 



Fig. 3. QCM syntax for the table of Fig. 1 



vide notation to import ABone-style lists from files into QCM’s internal format: 
import ( "hosts . allow" ) is the QCM table (policy) obtained by reading in the 
ABone file hosts. allow. 

In QCM policies are defined and controlled by principals who give them 
names. For example, a principal K\ could give an access control list the name 
ACL, or a public key directory the name PKD. Nothing prevents another principal 
K 2 from assigning different definitions to the names ACL and PKD. To distin- 
guish between the policies of different principals, we use global, or fully-qualified, 
names: ATi$ACL (pronounced “Kis ACL”), or AT 2 $ACL. 

When a principal needs to distribute its policies out into the network, it typi- 
cally does so using a signed document. Such a document cannot be used without 
verifying the principal’s signature, and this requires the principal’s public key. 
To ensure that a principal’s key is available when needed, we identify principals 
with their keys; that is, in QCM, principals are keys. Names and the strategy 
of using keys as principals are used in other policy languages, and we borrowed 
our notation for principals from one of them, SDSI [16]. Some examples appear 
in the table of Fig. 3. Since principals are long, we usually abbreviate them with 
K, K', etc. 

We can now give the QCM policy of the master ABone server, S: the server 
has a public key Kg that is widely known, its policy is a definition 

ACL = import ( "hosts . allow" ) , 

and other ABone nodes can refer to the server’s policy as ATgSACL. 

We still need a way for an ABone node to define a policy that incorporates 
the server’s policy, but overrides it where desired. QCM supports this through 
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composite policies that refine, augment, and combine the policies of multiple 
principals. ACLl, below, is a composite policy. 

ACLl = { (h,k) I (h,k) <- Ks$ACL, h != "careless.org" } 

Here, ‘I’ stands for ‘where,’ and stands for ‘is an entry of’ and != stands 
for ‘not equal to’ 7 ^. Thus, in words, the policy says 

ACLl is a table with entries (h,k), where (h,k) is an entry of Ks$ACL, 
and h is not "careless.org". 

So, ACLl discards some unwanted entries from the server’s table. Entries can 
also be added, using union: 

ACL2 = unionCACLl, 

{ ("claire.com", ATdaire) d) 

where union(el , e 2 ) is a syntactic sugar for lj{el , e 2 } (IJ is an operator that 
takes a set of sets as an argument and returns the union of those sets a result). 

Finally, a composite policy can be built from the policies of multiple princi- 
pals: 

ACL3 = { (h,k) I (h,k) <- Ks$ACL, (h’,kO <- ATbob$ACL, 
h=h’ , k=k’ } 

This is the intersection of the ACL’s of Kg and K\^oh- (h,k) is only an entry of 
ACL3 if it appears in the policies of both S and Bob. This is how QCM can define 
policies that depend on the joint authority of multiple principals. 

For example, suppose a client ANETD uses the policy ACLl from above to 
decide whether to grant requests to run active network experiments: 

ACLl = { (h,k) I (h,k) <- Ks$ACL, h != "careless.org" } 

If ANETD receives a request originating at alice . com and signed by ibaiicej 
it needs to hnd out whether (alice . com, ATaiice) appears in the policy ACLl. It 
uses QCMD to hnd out, by asking it to evaluate the following policy to a table: 

{ "yes" I ("alice . com" , ATaiice) ACLl } 

If ("alice.com" ,ibaiice) is an entry of ACLl, the policy will evaluate to the 
table {"yes"}. Otherwise, the policy will evaluate to the empty table, { >. So, if 
QCMD’s answer is {"yes"}, ANETD grants the request, otherwise, the request 
is denied. 

Policy evaluation is easy if all of the policies can be gathered in one place, 
as we see here: 

Kg$ACL = { ("alice.com", Afaiice) > 

("bob.com", Khoh) , 

("careleSS.org", ibcareless) } 

ACLl = { (h,k) I (h,k) <- Kg$ACL, h != "careless.org" } 

{ "yes" I ( "alice . com" , ATaiice) ACLl } 
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For example, to evaluate { "yes" I ("alice . com" , iFaiice) ACL 1 }, first 
evaluate ACLl to a table, then iterate over every entry. If ( " alice . com" , Afaiice) 
appears, then add the entry "yes" to the result. ACLl can be evaluated similarly. 
The final answer is {"yes"}. 

Usually, however, the necessary policies will not be available at every node. 
In this case QCMD must use some kind of retrieval protocol to obtain the remote 
policies. We support a variety of protocols, each with advantages and disadvan- 
tages for the parties involved. The rest of this section discusses the protocols 
and their tradeoffs. 



4.2 Certificate Push 

Most systems for verifying access control policies will not retrieve missing certifi- 
cates. Instead, they require certificates to be presented to the local policy engine 
by the party who wants to be authorized. We call this a ‘push’ protocol because 
certificates are not requested by the policy engine, they are supplied as inputs 
by some out-of-band means. QCM supports a push protocol using certificates of 
the following form. 

<Document = Member ("ACL" , ("alice . com" , Afaiice) ) > 

Signature = "mQinGCBzKGtza4X6 . . . " , 

Signer = Kg> 

The certificate says that ("alice . com" , Afaiice) is an entry of the table 
Ks$ACL. For the certificate to be valid, its signature must have been produced 
by the signing key of Ks; this can be verified with Ks. (We also support certifi- 
cates with expiration dates.) If Alice presents this certificate to ANETD along 
with her request, ANETD simply passes it on to QCMD. QCMD verifies the 
signature on the certificate, uses it to construct an approximation to the table 
A' 5 $ACL, and adds the approximation to the local collection of policies: 

Ks$ACL = { ("alice . com" , ATaiice) } 

ACLl = { (h,k) I (h,k) <- Kg$ACL, h != "careless.org" } 

{ "yes" I ( "alice . com" , A'aiice) ACLl } 

At this point, QCMD proceeds with local policy evaluation as usual, giving 
the answer {"yes"}. Of course, the approximation is not the actual value of 
the table Ks$ACL. QCM has an important monotonicity property that justifies 
use of the approximation. The table that QCM computes for the policy is an 
approximation of the real value of the policy. Monotonicity means that if QCM 
starts off with a better approximation for remote policies, then it computes a 
better approximation for the result. This means that if QCM says that "yes" is 
an entry of the table, then no additional information about Ks$ACL could imply 
that "yes" is not in fact an entry. 

Monotonicity guarantees that no request is granted when the remote policy 
says it should be denied. But there is no guarantee that all requests that should 
be granted, will be granted. If Alice presents an invalid or expired certificate with 
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her request, then under this protocol her request will be denied even though she 
appears on the master access list at S. This places a significant burden on Alice: 
she has to manage her certificates, and periodically request new ones as they 
expire. For unsophisticated users a pull protocol may be more convenient. At 
the very least, users will require some automated support to help them manage 
certificates. And it may be necessary for the administrators of the system (e.g., 
the ABone nodes) to take on more responsibility for certificate distribution. Our 
other protocols show how this can be handled transparently in QCM, without 
any policy changes. 

4.3 Online Query/Response 

When QCM needs the value of a policy that is not available locally, its default 
retrieval policy is to obtain it using a secure query /response protocol. The pro- 
tocol involves two QCM daemons, a client that requests the policy, and a server 
that supplies it. 

In our example, the client C runs at an ABone node, and the server is running 
at the master ABone node at S. To obtain AT^SACL, C can invoke the following 
protocol. 

1. The client sends a query asking for the value of the policy ATgSACL to the 
server. (Recall that the principal Ks can be tagged with the location of its 
server.) 

C S : Query (AT 5 $ACL) 

2. The server has the complete definition of the policy ib^SACL, so it can eval- 
uate the query by simply looking up the table. It sends the table, T, back 
to the client: 

S ^ C : ^^(ResponseCh, T)) 

The response contains a one-way hash h of the query, which can be used to 
ensure that this is the answer to the question asked rather than a replay of 
an answer to a different question. 

3. The interpreter verifies the signature on the response using the key and 
uses the hash to match up the response with the query. 

The policy asked in a Query message does not have to be a policy name like 
Kg$kCL; it could be any policy at all. In our example, a better query would be: 

C —> iS : Query ({"yes" I ("alice . com" , ATaiice) Kg$ACLy) 

Instead of signing and returning the entire table Kg$ACL, the server just 
checks whether ("alice . com" , ATaiice) is an entry, and returns { "yes" } or 
the empty table accordingly. QCMD uses query optimization to automatically 
choose queries that result in smaller responses. 

The query /response protocol is invoked automatically by QCMD — ANETD 
does not have to do anything different. Just as with the push protocol, ANETD 
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formulates the query it wants answered, then submits the query and any cer- 
tificates it might have received from Alice to QCMD. QCMD uses whatever 
certificates it can, and sends queries to obtain remote policies when it needs to. 

The server could of course refuse to answer the client’s query, or, the server 
might be down. If so, the client QCMD returns an error. Another idea is to 
allow QCMD to assume its response is the empty table and continue evaluation. 
Since the empty table is a valid approximation for any other table, monotonicity 
guarantees that this cannot cause the request to be granted when it should be 
denied. Other available certificates might be sufficient to grant the request, so 
the client will continue with evaluation. 

4.4 Verify Only 

The normal QCMD evaluation strategy is to take whatever certiheates are 
pushed at it, use them to the extent possible, and use query/response to ob- 
tain any other needed certificates. Sometimes the client may want to rule out 
even this limited query/response. Therefore we provide a mode, called verify-only 
mode, in which QCMD never invokes query/response. In this mode, a request 
will only be granted if all necessary certificates are presented to the client up 
front. 

4.5 Offline Query/Response 

The query /response protocol we described above required the server to sign 
responses online. The server at S might prefer not to have the signing key online. 
In that case, pre-signed responses can be prepared on a machine not attached 
to the network, and stored on the server. Then the protocol works as follows. 

1. The client formulates its query, Q, as usual and sends it to the server. 

C ^ S : Query(Q) 

2. On startup, the server is given the certificates that were signed offline. The 
certificates describe the content of tables defined by Ks; the server can 
reconstruct the tables from the certificates. Once the server has the tables 
it can evaluate Q; as it does so, it remembers what entries in the tables are 
accessed, and what certificates contributed to the entries. The server does 
not return the answer that it computes, since it cannot sign it; instead it 
returns the required certificates. Each individual certificate is signed, so the 
complete message does not have to be signed. 

iS — > C : Certif icatesCci, . . . ,c„) 

3. The client receives a Certificates message instead of the Response message 
that it would have received if the server was doing online signing. The client 
checks each of the signatures on the supplied certificates, and evaluates the 
original query in verify-only mode, using the certificate push protocol to take 
the supplied certiheates into account. 
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5 Conclusion 

We have described a system and policy description language to maintain ac- 
cess control infrastructure for the ABone experimental testbed. Our approach 
is based on an extension of the QCM system and provides a significant degree 
of local autonomy as well as support for policy-directed certihcate retrieval. We 
have described two secure mirroring protocols and how they can be integrated 
with a collection of other protocols. The protocols have been implemented, and 
this paper describes some simple tests to measure the scalability of the mirroring 
protocols for the ABone. We end with a brief discussion of some additional issues: 
revocation, integration with other policy and certificate systems, and control of 
loops. 

If long-term certificates are used, then there is usually demand for a revoca- 
tion mechanism, that is, an ability to announce that a valid certificate which is 
properly formed and not expired should no longer be respected by verifiers. This 
introduces many complexities. We have developed a way to do policy-directed 
certificate retrieval with revocation: [6] describes a language and analyzes its 
security model rigorously. We have also developed an implementation for an 
internal language supporting this model of revocation (an external language 
would be used by policy writers and then compiled into the internal language). 
These extensions could be added to the language in Table 1 at the cost of more 
complexity than we could discuss in this paper. We refer the reader to [6] for 
details. In its initial deployment, QCMD will avoid both long-term certihcates 
and revocation. 

QCM uses its own certificate formats, but there does not appear to be any 
impediment to using X.509v3 formats if this would aid interoperability with 
other systems. It is unlikely that any single policy description system will satisfy 
all needs, so our expectation is that some ‘glue’ between such systems will be 
necessary. QCM seems best suited for coarse-grained access control such as the 
ABONE rather than fine-grained access control like the policy of a reference 
monitor in an operating system. An interface for using QCM for access control 
in the PLAN EE [10] was developed by Hicks [8] and used to develop an ac- 
tive firewall application [9]. In this case, QCM was used to determine policies 
about which network services various agents were allowed to use. Efficiency was 
enhanced by caching information about QCM verification decisions. 

Another interesting problem with QCM is the threat of circular dependencies 
such as a situation in which principal A delegates to prinicipal B and principal B 
delegates to principal C and principal C delegates to principal A. This problem 
can be addressed by assuming that such circles of common interest work out 
a reasonable delegation structure among themselves ‘out-of-band’. Research is 
underway on an automated solution; for instance QCM queries could carry more 
information about their origins in order to detect the loop dynamically. 

We have developed implementations for all of the protocols described in this 
paper, but not yet in a unihed system. The mirroring portion of the interface 
has been implemented for deployment on the ABone, and this deployment is 
currently underway. Once deployed we hope to gain additional insights about 
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load and interface requirements, as well as experience with what kinds of local 
autonomy its users demand. 
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Abstract. Network monitoring is of vital importance to the proper op- 
eration and maintenance of any kind of networks and it plays an espe- 
cially important role in active networks. We discuss in this paper the 
uniqueness of active network monitoring in the context of the ANSWER 
system. ANSWER is an information discovery system built on an on- 
tology based information hierarchy and an active network architecture. 
We have developed a monitoring and management tool referred to as 
A AM05( Active Network MOnitoring System) which gives users enor- 
mous flexibility in debugging, monitoring and fine-tuning the ANSWER 
system. ANMOS has a layered architecture and can be readily extended 
to interface with other type of active networks and serve as a generic 
monitoring tool. We present the architecture and important system fea- 
tures of ANMOS and use them as the platform to highlight some of the 
distinctive characteristics of active network monitoring. 



1 Introduction 

Network monitoring is of vital importance to the proper operation and main- 
tenance of any kind of networks. It provides a way for the network operators 
to observe the network conditions and identify “trouble spots” in the network. 
Based on the monitoring results, proper measures can then be taken to fix prob- 
lems and fine tune network performance. Network monitoring plays an espe- 
cially important role in active networks. This is because network nodes are no 
longer packet forwarding engines, they offer an execution environment to the 
active packets. The flexibility brought about by such a paradigm change means 
a greater dimension of freedom and subsequently a lot more aspects that need 
monitoring. Fortunately, such flexibility also brings great benefit to and provides 
a new framework for network monitoring. Network intelligence can be fully uti- 
lized to perform much more flexible and powerful monitoring tasks which can 
be extremely difficult and sometimes may not be possible in traditional net- 
works. These tasks, carried out in the form of service functions at network nodes 
and active packets from monitoring entities, are handled in exactly the same 
way as regular data transporting tasks in active networks. Such consistency not 
only enables monitoring of activities in the data transmission path, but also 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 296-315, 2000. 

@ Springer- Veiiag Berlin Heidelberg 2000 



Looking Inside an Active Network 



297 



greatly simplifies active network monitoring ~ it is no longer necessary to define 
a separate suite of network management protocols as in the case of traditional 
networks. 

In this paper, we discuss these unique features of active network monitor- 
ing in the context of an active network based information system called AN- 
SWER{Active Network Supported WEb Rendezvous) [1] . It is a system de- 
signed to address some of the problems in the information discovery model in 
the current web. The fundamental concept behind the ANSWER model is a 
system capability called “information routing” . This refers to the ability of the 
underlying network to route packets which only contain specifications of interests 
instead of explicit addresses. In ANSWER, this is partially achieved through on- 
tology based semantic structuring[2,3,4] and active network[5] supported packet 
forwarding. 

Given the originality and the complexity of the ANSWER system, it is highly 
desirable that we have a user friendly front end which can be used as a tool to 
observe and improve the system. In particular, we would like to “look inside” 
the ANSWER network with the help of such a tool in order to better understand 
the operation of the system. For these purposes, we designed and implemented 
a monitoring interface with a rich set of features for the ANSWER system. It is 
referred to as ANMOS which stands for Active Network MOnitoring System. 
Through this interface, users can control and observe the network topology, the 
routing structure and various ontology trees. Query and information packets 
can be easily sent out from any ANSWER nodes and the users have the option 
to observe the packet routing process as an animation trace. These and many 
other features give users enormous flexibility in debugging, monitoring and fine- 
tuning the ANSWER system. ANMOS has a layered architecture and can be 
readily extended to interface with other type of active networks and serve as a 
generic monitoring tool. Work is under way to provide such generic extension 
using XML as a glue to dynamically adapt user front end and to interface with 
multiple active network formats. 

We present the most important features and the system structure of the 
ANMOS tool in this paper. In particular, we shall use it as the platform to 
highlight some of the distinctive characteristics of active network monitoring. 
Certain representative monitoring functions in the ANMOS interface and the 
corresponding active code for the ANSWER system will be presented to explain 
these special characteristics. 

There has been much research[6,7,8,9,10,ll] lately on using the active net- 
work framework for network management, of which network monitoring is an 
essential part. For instance, the “smart packet” project at BBN Technologies [6] 
was specifically designed to address network management issues using active net- 
work technologies. A managed node is made programmable by installing a vir- 
tual machines to provide a context for executing programs within smart packets. 
Management centers can then send programs to the managed nodes to diagnose 
and possibly correct problems. In [7], Kawamura and Stadler proposed a new 
network management architecture referred to as Active Distributed management 
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(ADM). It is implemented as a management middleware composed of several lay- 
ers and provides the properties of distributed control and programmability inside 
the network using a combination of active network and mobile agent paradigms. 
Management programs can be launched from a management station attached 
to any of the active nodes and may be duplicated among multiple nodes for in- 
herently distributed management tasks. In [9], Greenwood and Gavalas outlined 
a modular, process based architecture for network-element management. Such 
an architecture supports both agent and conventional SNMP [12] messaging. A 
process kernel was proposed as the core of the architecture. Well defined manage- 
ment functions such as automated device registration and local fault diagnosis 
can be pre-loaded into the devices and executed by the kernel. Experimental re- 
sults showed improved response time and reduction in bandwidth consumption 
over traditional management architecture for simple aggregated management 
functions. In [10], Raz and Shavitt presented an active network framework de- 
signed primarily for simplifying management tasks in modern complex networks. 
The main component of the framework is an active engine that is attached to 
any IP router to form an active node. The notion of sessions was introduced to 
generalize the soft state concept and to enable long lasting applications that are 
typical to network management to reside in the active node. An active engine 
is able to access network layer information on its connecting IP router through 
SNMP. 

These projects focus on utilizing the active network paradigm to facilitate 
network management and they all demonstrate the advantages and potentials 
of such an approach. Each of them has a specialized active network architec- 
ture which primarily targets management activities in the control plane. As 
an integral part of network management, network monitoring based on the ac- 
tive network framework will no doubt receive the same benefits. The research 
presented in this paper has a different focus. We concentrate on monitoring of 
general purpose active networks and the applications they support. By taking 
advantage of the great flexibility in the active network paradigm, we not only 
monitor the control plane functions, but are more interested in looking inside 
the data path of the network applications. Our monitoring framework is generic 
enough and can be readily extended to various other active networks and their 
applications. 

The paper is organized as follows: We first give a brief review of the ac- 
tive network paradigm and the ANSWER system model in section 2. We then 
present in section 3 the most interesting features of the ANMOS tool and their 
application scenarios in ANSWER system monitoring. In section 4, we explain 
the unique characteristics of active network monitoring using illustrative exam- 
ples from the ANMOS environment. In section 5, we discuss some of the design 
considerations and the overall structure of ANMOS. Next, we present an AN- 
SWER supported video application and describe how it works seamlessly with 
the ANMOS system in section 6. Finally, we give the conclusion in section 7. 



Looking Inside an Active Network 



299 



2 The ANSWER System 

ANSWER is an efficient and flexible information system which employs active 
networks as its core and ontology hierarchies as its semantic framework. It is 
built upon a concept referred to as “information routing” and a symmetrical 
interaction model between information providers and consumers. In short, “in- 
formation routing” is an extension to the traditional network routing functions 
with information discovery mechanisms. An “information routing” capable net- 
work thus can route not only packets with specific destination addresses but 
those that only contain the specifications of their interests. 

2.1 Active Networks 

As a relatively new network paradigm, active networks give more computing 
power to the network nodes which may execute programs invoked by active 
packets[13,14,15,16]. These programs can be either pre-loaded into the nodes or 
carried within the packets. A programmable network gives enormous flexibili- 
ties in new protocol deployment and application specific interactions with the 
network. Such interactions are crucial to the functioning of information routing 
in the ANSWER system. In active networks, security and resource issues are of 
paramount concerns and must be considered carefully in the network design. 

Several active network packages have been developed in recent 
years[13,14,15,16]. Most of these packages allow on-the-fly code loading 
into active nodes and dynamic invocations by active packets. The current 
ANSWER system is implemented using the active network package PLAN 
developed in the CIS department of University of Pennsylvania. This package 
consists of a simple, ML-like functional language with remote evaluation 
primitives and service functions which can be deployed and invoked on active 
nodes. An active PLAN packet is created by bundling a PLAN program into 
the packet. It is then injected into the network and evaluated at a network 
node. As a result of the evaluation, new packets may be created, forwarded 
and evaluated at other nodes in the network. An important PLAN primitive 
is OnRemote{E, H, Rb, Rt) which can be used to evaluate program E on 
node P[ with resource bound Rb and routing function Rt. This primitive 
provides a general way to invoke programs on active nodes in the network. 
Another primitive, OnNeighbor(E, P[, Rb), is a special case of OnRemote 
where program E is evaluated on the neighboring node H with resource bound 
Rb. Some of the PLAN code segments used in monitoring the ANSWER system 
through the ANMOS interface will be shown in a later section to help explain 
the uniqueness of active network monitoring. 

2.2 The ANSWER Architecture 

The essence of the ANSWER model is that the network becomes capable of 
filtering and directing information flows and eventually completing the binding 
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processes for both information providers and consumers. The ANSWER system 
efficiently carries out these information tasks through the following processes: 

— Semantic structuring of the ANSWER application data. 

— Building a global ANSWER routing architecture. 

— Abstracting and storing information from the packets injected into the AN- 
SWER network and building routing tables based on such information. 

— Making hop-by-hop routing decisions according to the routing tables and 
possibly using application specific routing code in the packets. 

The ANSWER system provides a generic ontology hierarchy as the basic 
framework for software agents (in the form of active packets within the AN- 
SWER context) to properly interact with each other. Simply speaking, an on- 
tology defines a common vocabulary that may be shared by a group of software 
agents communicating with each other in a consistent way without sharing a 
common knowledge base. Ontology definitions are often represented as hierar- 
chies of classes much in the same way as typical categorization schemes such as 
most Web directories, the library of congress classification and the WordNet[17] 
synset hypernym structure. Given its simplicity and expressiveness, ontology hi- 
erarchies are also used in the ANSWER system for semantic structuring and 
automated information binding. Index information abstracted from both appli- 
cation content and user queries are represented in the form of ontology trees 
and used as routing tables at the ANSWER nodes. Such indices are properly 
distilled (merged) when necessary, resulting in efficient storage and distribution 
in the network. Generic ontology tree operations such as pruning and merging 
are installed at each ANSWER node while application specific operations may 
be defined and carried inside each packet for refined processing. 

In order to guarantee efficient information distribution and simplify the rout- 
ing information gathering process, the ANSWER system employs a loop free 
routing structure. The Core Based Tree (CBT) algorithm[18] designed for IP 
multicasting is used to build and maintain a shared tree as the ANSWER rout- 
ing core. A non-cyclic routing structure ensures that the routing table at each 
ANSWER node only needs to maintain information reachability knowledge on 
each routing tree interface, thus greatly reduces the number of table entries. To- 
gether with the fact that only index information is stored in the routing tables, 
this makes the ANSWER architecture highly scalable. 

Routing tree neighbors periodically exchange their routing tables (ontology 
trees) to propagate reachability information. Only edge ANSWER nodes (i.e. 
those with ANSWER applications attached to them) build their ontology trees 
directly from the content and query packets passing through them. These trees 
are then propagated into the core of the ANSWER system through periodic 
routing table updates. This ensures that no redundant information is sent to 
any ANSWER nodes and that no misrepresentation of information exists in any 
of the ontology trees in the network. 

A distinctive feature of the ANSWER system is the symmetrical behaviors 
of the information providers and information consumers. In another words, both 
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Fig. 1. The monitoring interface 



provider and consumer messages can be routed based on the traces left in the 
system by the opposite types of messages in the form of ontology trees. When 
content or query packets are injected into the ANSWER system, the correspond- 
ing network nodes check their ontology trees to determine which neighboring 
nodes may lead to the matching user queries or application content. Forwarding 
decisions are then made based on such information and the resource constraints 
reflected in the packets. A packet may be split into multiple packets to reach des- 
tinations with potential matching information. In such a case, packet resources 
need to be judiciously distributed according to additional statistical informa- 
tion (e.g. how many pieces of information on a particular topic can be reached 
through a neighboring node) contained in the ontology tree. 

3 The ANMOS System Features 

ANMOS serves as a convenient user front end which allows us to observe and 
interact with the ANSWER system. For these purposes, a rich set of features 
are provided and can be easily used. In this section, we present some of the most 
interesting features and illustrate their use through intuitive examples. 
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Figure 1 shows the overall look and feel of the interface. It is divided into two 
panes. The left pane is used to show the topology of the underlying network. This 
includes all the active nodes, the information provider nodes, the information 
consumer nodes and the network links connecting them. Different type of nodes 
are illustrated using different set of icons. For provider nodes, the relative amount 
of information content residing on them is illustrated intuitively through the 
heights of the disk-like icons. The users can easily have an overall picture of the 
information distribution in the network. Since ANSWER imposes a routing tree 
structure to prevent loops in query and information distribution, the tree edges 
are also properly shown with a special color. Drawing the network topology and 
its connectivity is the first thing that the interface does when it starts up. The 
necessary information is obtained by querying the network with pre-programmed 
active packets. We shall get into more details of such programs in a later section. 
The right side of the interface is dedicated to showing various ontology trees - the 
universal ontology tree or the instance ontology trees associated with different 
nodes. Each node is shown in this pane as a button with its corresponding id. 
Clicking on a node in the left pane will automatically highlight the corresponding 
button on the right, and vise versa. 

We can interact with the ANSWER system in many interesting ways through 
the ANMOS interface. For example, we can modify the routing tree structure, 
examine the ontology trees on different nodes and inject query/information from 
end nodes. One typical way of starting such interactions is to bring up the 
popup menu associated with each node in the left pane. Functions related to the 
whole network are provided in the form of command buttons. Among them, two 
functions are worth noting: The “populate network” function, which causes the 
injection of predefined packets into the network and the “clear network” function, 
which reset the network into its primitive state(e.g. all ontology trees become 
empty). This is quite useful in running an experiment multiple times without 
restarting the whole system. In the following, we will focus on how query and 
information packets can be injected into the ANSWER network through the 
interface. 

Since ANMOS is able to determine the type of each node from the information 
provided by the underlying network, the type of packets that can be injected 
from a node is thus also determined. Injecting a query packet from a consumer 
node or an information packet from a provider node is quite similar: A popup 
window would ask for further details(e.g. the ontological category the packet 
belongs to) from the user and user inputs are then properly formated into the 
packet and sent into the ANSWER network. Despite the similarity, there are a 
number of differences. First, a producer node also provides the source address(the 
URL) of the information that the packet represents. This address is later used by 
matching query packets to properly retrieve the information. This is unnecessary 
for a query packet. Second, when a category in an ontology tree is selected before 
sending out a query packet, the relative percentage of information corresponding 
to this category among all the provider nodes are illustrated as pie charts on top 
of the disk icons. This immediately gives the user an idea on the distribution of 
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such information in the network and helps the user understand the query routing 
process. 

In order for the users to observe the paths taken by the packets in the AN- 
SWER network, we also provide a very useful animation feature in the ANMOS 
interface. When sending a packet, the user can turn on the trace option and 
subsequently see the routing process of the packet as an animation trace. The 
trace can be properly shown even if the packet splits at certain nodes. Resource 
distribution into the subsequent packets are also displayed intuitively. Figure 1 
illustrates a snapshot of the ANMOS interface. At the moment the snapshot is 
taken, a packet has just split into two packets which are sent towards two con- 
tent servers with possible matching information. The inner circle of each packet 
represents the amount of resource distributed into the packet. A bigger circle 
represents a larger amount which intuitively translates to a higher confidence 
level that relevant information can be found in that direction. 



4 The Uniqueness of Active Monitoring 

Traditional networks rely on the network management functions to pro- 
vide monitoring support which in general consists of a fixed set of lim- 
ited functionalities[12,19]. For example, the most dominant network manage- 
ment protocol in the Internet, SNMP, defines four basic commands for net- 
work management entities to query and control network devices. These com- 
mands are performed by management agents which interface with management 
databases[20,21,22] containing a predefined set of parameters. 

As we mentioned earlier, network monitoring is especially important and 
may have to cover many more aspects in active networks primarily due to the 
flexible architecture brought about by network programmability. On the other 
hand, programmability also can greatly facilitate monitoring in active networks. 
Monitoring activities can be coded into active packets and performed at the 
network nodes in a way that is uniform, flexible and extensible. We shall explain 
these special characteristics of active network monitoring in the following sub 
sections. 



4.1 Uniformity 

In traditional networks, the control plane and the data plane are both concep- 
tually and often operationally separated. Network monitoring actions reside in 
the control plane as an integral part of the network management functions. The 
clear separation of the two planes simplifies the design of the network nodes, but 
it also greatly limits the scope of the monitoring activities. This is because the 
packet processing path is orthogonal to the monitoring control path and there 
is very limited, if any information in the individual packet flow that can be col- 
lected by the control plane. For this reason, it is virtually impossible to perform 
monitoring on flows belonging to a certain class of application or to observe the 
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fun send_packet (query_string : string, 

direction : host*rb list) : string = 
let val host = #1 direction 

val resource = #2 direction 

in ( 

QnNeighbor ( I route_packet I (query_string) , host, resource); 
query_string 

) 

fun route_packet (query_string : string) : unit = 
let val directions : host*int list = 
search_ontology_tree(query_string) 
in 

foldl(send_packet, query_string, directions) 

Fig. 2. A simple ANSWER routing code segment 



precise impact of higher layer protocols on the network performance, nnless it is 
specifically enabled by the designer of the node. 

In active networks, although the control plane and the data plane may still 
be conceptually separated from each other, their distinction from an implemen- 
tation point of view can be very blurred. Since active packets can carry programs 
and invoke services at network nodes, network monitoring and packet processing 
functions can be realized in a uniform way through the execution of active pro- 
grams. Their major distinction is reflected in the access permissions of their cor- 
responding service functions on the network nodes. These different permissions 
are designed to enforce the security framework for the operatonal protection on 
the active nodes. From a monitoring point of view, however, the interactions with 
the two planes appear the same. This uniformity not only simplifies monitoring 
tasks, but makes it possible to observe and interact with the packet flows in the 
network, thus providing a much more powerful set of monitoring functions. 

As an example, let us consider a typical routing function used by an AN- 
SWER query packet. When the packet arrives at an ANSWER node, such a 
function would first check the ontology tree of the node and find all the ports 
through which the matching information can be found. It then distributes the 
query packet along these directions accordingly. The simplified PLAN code seg- 
ment in figure 2 illustrates this function. 

The flexibility brought about by the active network paradigm not only gives 
ANSWER applications much freedom in implementing a wide range of routing 
functions, but enables easy monitoring of the routing process. For example, we 
can simply modify the above code segment to let the packet report back its route 
selection and resource distribution decisions along the way to the monitoring 
host. The code segment in figure 3 illustrates the modihcation. 

This is indeed how the packet animation in ANMOS obtains the routing 
traces from the network. 
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fun route_packet (query_string : string, 

monitoring_host : host) : unit = 
let val directions : host*int list = 
search_ontology_tree (query_string) 

in( 

foldl(send_packet, query_string, directions); 
OnRemoteC I deliver I (getImplicitPort () , 

("Directions", directions)), 
monitoring_host , 20, def aultRoute) 

) 

Fig. 3. Code segment for routing trace report 



4.2 Flexibility and Extensibility 

In traditional networks, network nodes are treated as simple forwarding engines 
which are mainly responsible for properly moving network data from incoming 
ports to outgoing ports. This view dictates that networks are not given much 
more intelligence than making routing decisions. Although network management 
functions are separate from the data plane, they are also quite restricted and 
passive in nature because of the limited intelligence present in the network archi- 
tecture. As a consequence, network management functions are generally offered 
through a client server model where client softwares outside the network inter- 
act with management servers in the network to obtain status data and make 
diagnosis. In general, these servers provide a fixed interface through which a 
pre-defined set of parameters can be queried and set. This model offers little 
flexibilities for software agents to make complex management decisions and can 
be very inefficient for certain tasks. For instance, large sets of data may have to 
be moved from each node to the client host and to most of these data will be 
simply discarded there because certain conditions are not met. It is also difficult 
for new type of management parameters to be added into the server data set. As 
an example, in SNMP, parameter extension can only be made when designing a 
new network device, once it is in service, there is normally no way to extend the 
parameter set. 

Active networks, on the other hand, offer a much more flexible framework 
for network management. In particular, network conditions can be monitored 
and analyzed very conveniently through custom designed active packets. These 
packets may invoke various service functions at the network nodes and make 
intelligent decisions on their own. This is especially useful when a wide variety 
of parameters need to be examined and synthesized across a large set of network 
nodes. The programmability of the network also makes new service extension 
feasible. Additional management services can be installed at network nodes on 
demand to provide information customized to specific network monitoring needs. 
Such extensibility is very difficult, if at all possible, in the traditional networks. 
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To illustrate these advantages in active network monitoring, we shall show 
a simple example in the ANMOS context. In the ANSWER system, ontology 
trees in the network nodes are used as routing tables to direct query and con- 
tent packets to their matching destinations. We can think of each node in the 
ontology tree as a multicast group in the traditional network context. At a cer- 
tain stage, we decided to add to ANMOS the capability to monitor the network 
traffic belonging to each ontology category. To do this, we modified the routing 
table distribution function to record the number of queries belonging to each 
ontology category and provided a generic function called “get_tree_summary” to 
return the list of ontology nodes with their corresponding query counts. These 
functions were then dynamically loaded into the ANSWER nodes. The added 
functionality allows the observers to query each node and display special condi- 
tions accordingly. For example, the interface may highlight a network node when 
its ontology tree contains a node with a traffic index above a certain threshold. 
With a client-server model, the monitoring software has to download the whole 
tree from each network node, analyze it and determine if the condition is met. 
This would inevitably result in a very inefficient implementation and cause heavy 
load in the network. In the ANSWER network, this task can be efficiently car- 
ried out by an active packet which can be flexibly adapted according to the 
monitoring requirements. The simplified code segment in figure 4 illustrates this 
point: 



fun alert_condition(otree : string*count list) : bool = 



fun send_to_neighbor(nb: host) : unit = 



fun check_ontology_tree (monitoring_host : host) : unit = 
let val ontology_tree : string*count = get_tree_sunmiary () 
val neighbor_list : host list = get_tree_neighbors () 

in ( 

if alert_condition(ontology_tree) 
then 

let val my_host = thisHostO 
in 

OnRemoteC I deliver I (getImplicitPort () , 

("Traffic Alert", my_host)), 
monitoring_host , 20, def aultRoute) 

else () ; 

foldl(send_to_neighbor , () , neighbor_list) 

) 

Fig. 4. Code segment for onotology tree processing at active node 
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5 The ANMOS System Design 

While the primary purpose of ANMOS is to provide a monitoring interface for 
the ANSWER system, we designed it in a more generic way from the very be- 
ginning. This is based on the observation that most system monitoring tools 
have very similar structures and functionalities. Thus one of the design goals of 
ANMOS was adaptability and extensibility. More specifically, we would like to 
apply the same system to a different underlying active network with minimal 
changes. This requires that the core of ANMOS be shielded from the details 
of the communication with the ANSWER network. What we need is a layer 
which provides the logical representation of the underlying network. This layer 
is responsible for maintaining a consistent view of the network for the graphical 
interface regardless of the possibly diverse network semantics. 



5.1 Three Layers 

Based on this observation, we divided the ANMOS system into three layers: 

The graphical interface layer This layer manages the graphical components 
of the user interface, provides support for user interactions and visually 
presents the details of the ANSWER network. 

The logical network layer This layer serves as the interconnection unit be- 
tween the visual interface and the network specific elements. Entities in the 
network such as nodes, edges and ports are represented through logical com- 
ponents some of which may be composite. 

The network communication layer This layer is responsible for the actual 
communication with the ANSWER network. User requests are converted 
into active packets and network responses are translated into messages for 
the corresponding components in the logical layer. 

Figure 5 illustrates the system architecture of ANMOS and the relationships 
among these three layers. As is shown in the figure, ANMOS has an event driven 
architecture. Each layer exposes to its upper layer a set of APIs which are used 
by the upper layer to send requests downward. In the upward direction, the 
interactions are in the form of event dispatching. For example, a logical node 
in the logical network layer may register with its counterpart in the network 
communication layer and express its interests in certain events such as ontology 
tree changes. An interface component representing the node may in turn register 
with the logical node and update the visual interface whenever a specified change 
occurs. This chain of event registration and propagation ensures the efficient and 
prompt operation of the ANMOS system. Note that the three layers have the 
same design pattern. This not only simplifies the design but also allows us to 
reuse many components in all layers. 
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Fig. 5. The architecture of ANMOS 



5.2 Network Communication Layer 

Simply speaking, this layer provides a “shell” between the ANSWER network 
and the rest of the ANMOS system. This shell shields the upper layers from 
the details of the underlying network and provides the functionalities required 
by the upper layers with a clean and intuitive API. It is also the hub for all 
network messages delivered to ANMOS. The results from the execution of the 
active programs in the ANSWER network are filtered and translated into cor- 
responding events to the intended entities in the logical network layer. We list 
a few important API commands and events in the following. Note that we use 
strings for the API commands mainly for extensibility, i.e. there is no need to 
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extend the interface or add new constant definitions each time a new command 
is introduced. 

— API commands 

• “find nodes”: Find all the nodes in the network. 

• “node edges” : Find all the edges of a node. 

• “ontology tree” : Get the ontology tree on a node. 

• “send packet” : Inject a packet into a node. 

• “trace” : Send a packet with trace tnrned on. 

Events 

• NODE/EDGE/PORT: An event related to a node/edge/port. 

• TRACE: The event contains the information on a trace. 

• ONTO: Ontology tree related. 

• TOPIC: The event relates to a certain topic in the ontology tree. 




Based on the functionalities, this layer can be decomposed into several ma- 
jor parts: The front end stub, the code mapper, the message interpreter, the 
name mapper and the base. All of them are defined as abstract interfaces. The 
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current system has implementation for the ANSWER system using the PLAN 
package, but it should be straightforward to adapt them for other active net- 
work systems. Figure 6 illustrates these components and their relationships. The 
front end stub handles the communication with the ANSWER network includ- 
ing packet injection into the network and asynchronous packet reception from 
the network. The PLAN code mapper selects the right PLAN code for a desired 
command and properly packs the program and the user supplied parameters 
into the active packet. The message interpreter decodes results returned from 
the network and translates them into messages understood by the upper layer. 
The name mapper maintains a database mapping between the ANSWER net- 
work physical addresses and more mnemonic host names. It also maintains states 
between the code mapper and the message interpreter for proper asynchronous 
message identification. Finally, the base part glues the above three components 
together and presents a coherent API to the upper layer. 

5.3 The Logical Network Layer 

The most important purpose of this layer is to materialize the model of inter- 
relationships among various logical entities in the ANSWER system. Such en- 
tities can take different forms some of which may not have exact counterparts 
in the underlying network. For example, a neighborhood of network nodes may 
be grouped into a domain and managed together by the logical network con- 
troller. Such a domain may be created through the graphical interface for ease 
of observation, or it may be decided by the logical layer to reflect organizational 
or operative boudaries. Aggregate statistical information can be calculated and 
conveyed to the neighboring layers. 

While there are many possibilities of composite entities, the most common 
units in this layer are those basic ones which have obvious correspondence in 
the other layers. These include different types of nodes, outgoing and incoming 
ports on these nodes and links connecting the neighboring ports. These entities 
register with the network entities in the lower layer and can in turn accept 
event registrations from the interface layer entities. Multiple entities at this layer 
may register with the same entity in the lower layer(e.g. two neighboring logical 
ports may register with the same link to determine their connectivity) and vise 
versa(e.g. a logical link may register with two neighboring nodes in order to 
obtain the traffic pattern in both directions). This is also true for the event 
relationship between the top two layers. 

5.4 The Graphical Interface Layer 

With the network communication channels and network information manage- 
ment mechanisms in place, the graphical interface layer presents the logical view 
of the network to the user and processes user requests generated from the visual 
interface. Like the lower layers, the operation of this layer also relies heavily on 
events. Some of the events, those related to the graphical Swing elements, are 
handled directly by the Java virtual machine. 
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In order to dynamically display and manipulate the layout of the ANSWER 
network, we developed a Java package called “Lisa” . It is built as a constraint 
system which captures the relationships among the graphical components on the 
user interface. As a typical example of such relationships, consider the scenario 
where we drag and move a graphical element representing a switch. It is essential 
that the network links associated with this switch remain connected during and 
after the drag operation. A more complicated case is the trace animations where 
packet icons properly follow the links along its path. In Lisa, the dynamic trace 
can be correctly drawn even if a network node on the path is dragged during the 
animation. Lisa provides a rich set of graphical elements such as circles, lines, 
images and groups for the presentation of the ANSWER network. 

5.5 System Operation 

With the details of each layer in place, we now give a description of the overall 
system operation, including the system startup process and the typical system 
data and control flows. Note from Figure 5 that a “factory” component and 
a “controller” are present in every layer. A factory is used to create new ob- 
jects in its layer and a controller is responsible for managing these objects. At 
system startup, the interconnections among these factory and controller ob- 
jects are first properly set np as shown in Figure 5. As mentioned earlier, the 
downward connections are established by simply maintaining a reference to the 
object representing the corresponding layer. The upward connections are prop- 
erly set up through event registration. The network communication layer then 
attempts to connect to the ANSWER network. Upon successful connection, the 
bootstrapping process is completed. The next step, system initialization, gath- 
ers information about all the nodes and edges in the network that are to be 
monitored. Such information can be obtained either through a configuration file, 
or in the case of the ANSWER environment, by sending a probing packet into 
the network. When the result packet comes back, the communication factory 
component consults the “name mapper” and creates a representative object for 
every element referred in the packet without an existing mapping. The factory 
component at the logical network layer is then informed of these network ele- 
ments so that the corresponding logical components are created properly. The 
Logical network controller in turn propagates such information upwards to the 
factory component in the interface layer. This results in the creation of the new 
graphical components to be presented on the interface. Finally, the various com- 
ponents in the top two layers register with their corresponding components in 
the lower layers and the ANMOS system is properly set up. 

To illustrate the typical network operation after the initialization process, 
let us walk through an example in which the user requests the ontology tree 
on a network node. After the user selects the corresponding command from the 
pop up menu associated with the node, the graphical layer invokes the proper 
method in the logical layer. This method passes the “ontology tree” command 
and the logical name of the node to the communication layer. The code mapper 
in this layer then selects the corresponding PLAN code template and fills in 
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Fig. 7. The video client interface 



the destination address and request id obtained from the name mapper. The 
resulting code is packed into a PLAN packet and injected into the ANSWER 
system. Note that the request id is maintained by the name mapper to associate 
state information such as node name and timestamp with a request. When the 
PLAN packet containing the result is returned from the network, it is passed 
to the message interpreter which, among other things, extracts the request id 
and the message id. The request id is used by the name mapper to determine 
the corresponding node name and the validity of the result, e.g. whether the 
user has cancelled the request or a newer version of the ontology tree arrived 
earlier. The message id is checked against the message table to find out how 
the message should be handled. In this case, the ontology tree is extracted from 
the packet and an ontology tree update event is sent to the logical node object 
which registered this event with the base part. Following the same event pattern, 
the ontology tree will be eventually propagated into the corresponding graphical 
node object and displayed on the ANMOS interface. 
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6 The Video Application Interface 

Built on top of the ANSWER backbone, we developed a video application in 
which music videos can be searched and distributed using an ontology tree based 
categorization scheme. The graphical user front end allows the users to input 
queries and display search results. Query results are returned to the interface 
as destination “URLs” for the matched videos. These destinations are then con- 
tacted and thumbnail images from the corresponding videos are extracted and 
displayed in the interface together with some textual description. The user may 
start playing the desired video by selecting its image representation. The sym- 
metrical information routing model in ANSWER makes another useful feature 
possible in this video application: As soon as a new video is added into the sys- 
tem, any video client who has issued a relevant search before will be immediately 
notified and a new thumbnail image will pop into the client interface. Figure 7 
shows the client interface. 

To better illustrate the operation of the ANSWER system in searching and 
distributing the video contents, we enhanced the ANMOS system such that it 
provides support for external applications. This requires that the active packets 
injected by the applications be able to send messages which can be correctly 
interpret by the ANMOS system. Currently, the ANMOS system supports two 
external application extensions: trace messages and topic distribution messages. 
The latter is used to inform ANMOS that a particular topic has been selected 
by the external application. This results in the intuitive pie chart illustration on 
top of the provider icons in Figure 1 representing the percentage distribution of 
the selected topic among all the provider nodes. 

The front end of the video application is written in itkS.O and uses the 
MpegTV package for video display. The communication with the ANSWER 
network is supported through the communication stub found in the ANMOS 
network layer. A simple java application serves as a wrapper around this stub 
and provides a simple API to communicate with the video front end through a 
pipe. 

7 Conclusion 

Network monitoring is an important task in network management and is of par- 
ticular importance in active networks. Because of the flexibility brought about by 
the programmability the active networks, network monitoring has many unique 
features compared with traditional networks. Some of these features are thor- 
oughly examined in the paper using the ANSWER system and its ANMOS 
interface as examples. 

ANMOS was designed as a monitoring system for the ANSWER system. Our 
main goal was to give the users a means to look inside the network and to interact 
with the ANSWER system in an intuitive and meaningful way. The system was 
designed to be flexible, extensible and can be easily adapted to other types of 
underlying active networks. The current implementation has achieved most of 
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the objectives of the ANMOS system and has become a useful tool for debugging, 
monitoring and demonstrating the capabilities of the ANSWER model. 

The system is still under constant improvements. Some of the more important 
new features that are currently under development include: Multiple demonstra- 
tion modes(continuous mode or step mode), dynamic network load illustration 
and automatic layout for large networks. We are also considering to enhance 
ANMOS with network management capabilities such as dynamically adding or 
removing nodes from the ANSWER network. Last but not the least, we are ex- 
tending ANMOS to interface with different kinds of active networks. A generic 
framework is being designed to seamlessly integrate a wide range of monitoring 
activities into ANMOS for applications across various active network platforms. 
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Abstract. This paper proposes a new policy networking system, which 
introduces programmable packets using Active network technology. The 
architecture has two novel features: 1) Allowing network elements to 
become a decision point so that the traffic between network elements 
and the system dramatically decreases 2) The programmable packet that 
can itself be a part of a “decision point” is generated by the system and 
executed in the network element, allowing the network element’s deci- 
sion point to be autonomously “intelligent”. After describing the cur- 
rent status of policy networking and Active networking, the paper shows 
the framework of new architecture and two advantages of this novel ar- 
chitecture that are crucial to deploy and develop policy networking sys- 
tem: 1) More customizability than existing policy networking architec- 
tures. 2) Less control traffic by reducing interaction between Policy 
Server and network elements. Finally, the paper discusses the future di- 
rection of policy networking and feasibility of this new architecture. 



1 Introduction 

There is a need to provide a novel network management environment that enables 
users to have desired QOS on a per-application basis. Although some products that are 
categorized as a policy networking system have appeared, their architecture heavily 
depends on legacy network management systems, which are in the shape of a central- 
ized system or, at best, a distributed system in one domain using middleware such as 
CORE A (Common Object Request Broker Architecture ) or RMI (Remote Method 
Invocation). Even the latter system may provide scalability, it is logically centralized 
as long as the system and network elements are separated by control protocols such as 
SNMP, COPS, or vendor-dependent Command Line Interface (CLI). The major 
drawback of such legacy control protocols is a dramatic increase in traffic, which 
renders the system a large overhead. Moreover, it is very cumbersome to keep up with 
version ups of such protocols. 

Therefore, the system needs to be enlarged in proportion to the increase in the num- 
ber of network elements or users in order to manage them. 
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This paper proposes architecture of policy networking system, which introduces pro- 
grammable packets using active network technology. The system has two novel fea- 
tures: 1) Allowing network elements to become a decision point so that the traffic 
between network elements and the system dramatically decreases 2) The programma- 
ble packet that can itself be a part of a “decision poinf’ is generated by the system and 
executed in the network element, allowing the network element’s decision point to be 
autonomously “intelligent”. 

The paper mainly describes the framework of the architecture and priority of the 
system comparing with legacy network/element management system. Also the paper 
shows the feasibility of applying the system to control legacy network elements. 

Section 2 describes the brief status of the current policy networking architecture, 
followed by Section 3, which gives the description of the current status of active net- 
work technology. In Section 4, a discussion is performed about the new policy net- 
working architecture in which the notion of programmable packets is introduced. 
Section 5 describes two advantages of our architecture. Section 6 shows our future 
works and the direction of Policy networking and Section 7 summarizes the paper. 



2 Current Status of Policy Networking 

Due to the dramatic growth of Internet, Internet Protocol (IP) is getting ubiquitous and 
dominant. Therefore, every kind of traffic uses this protocol, which causes a QOS 
problem. Bursty traffic like email or data packet can cause bad effect to the traffic 
such as voice or video, which is sensitive to jitter and latency. 

To avoid this effect, the architecture that enables to provide differentiated services 
in network domains is under way[l]. Moreover, to provide fine-grained services on a 
per-user or per-application basis, the Policy networking architecture is on its way to 
deployment [2]. This architecture is based on the notion that user/application and 
network elements (switches/routers etc.) in a network should be managed in a same 
repository and the interaction among them should be represented as a “policy”. There- 
fore, the traffic can be managed and given suitable QOS in a logically centralized way. 
The recent development and standardization of the repository schema and the infor- 
mation model are processed among some organizations such as IETF (Internet Engi- 
neering Task Force)[3] and DMTF (Desktop Management Task Force)[4], while the 
policy framework is among IETF. Fig. 1 shows the overall framework of policy net- 
working. An User or a manager sets desired policy(for example, “allocate minimum 
64kbps for Tom’s voice traffic if available” ) using user and network element data in a 
repository and sets this policy to the network element. To provide more usability of 
the repository, LDAP (Lightweight Directory Access Protocol)[5] is used as an access 
protocol. As a communication protocol between Policy Server( or Policy Consumer) 
and Policy Target (network element), COPS( Common Open Policy Service) Proto- 
col[6] and other existing protocols such as CLI( Command Line Interface ) and 
SNMP( Simple Network Management Protocol) are used. 
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The major drawbacks of the current framework are as follows: 

1. Policies are sent via various kinds of protocols (COPS, CLI or SNMP), which 
might cause different results among nodes even if the same policies are applied. 

2. Control traffic increases in proportion to the number of policies. 

3. Poor scalability Policy Server has to add control connection every time new nodes 
are introduced in an administrative domain, which might force Policy Server to ex- 
pand its ability. 

4. Difficulty in keeping up with the version ups of protocols 

For example, the current COPS protocol conveys policies using PIB (Policy Infor- 
mation Base) which is not standardized yet. Therefore, many policy-enabled routers 
and switches are forced to implement vendor-specific PIBs. This causes poor 
interoperability among them. 
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Fig. 1 Current Policy Networking Architecture 
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3 Current Status of Active Networking 

Active networking is a framework that enables to customize network on a per-user or 
even a per-packet basis. There are many projects regarding this framework all over the 
world. The major characteristic of this idea is that routing elements are programmed 
by the programmable packets passing through them. This allows users to perform 
computations while their packets are traversing routing elements. Therefore, users are 
able to optimize and extend protocols they use, and even develop new protocols and 
services. 

Although there exists no widely agreed terminology, most active networking archi- 
tecture consists of three elements: programmable packets, an Execution Environment 
(EE) and a NodeOS. An EE provides an environment to execute programmable 
packet. Therefore, an EE may exist for every flow or user. A popular example of the 

EE is JVM ^(Java Virtual Machine). NodeOS is a native Operating System (UNIX, 
NT etc.) that can provide resources and their scheduling for EEs. 

Although most of the approaches are on a software-basis, some unique approaches 
such as P4 (Programmable Protocol Processing Pipeline) [7] utilizes FPGA in order to 
leverage performance. 

While active networking architecture is flexible enough to apply to almost all kinds of 
networks, it has some tradeoffs between flexibility, performance and security. For 
example, a network ISP that provides only high quality access service may not want 
users to run their own programmable packets for security reasons, whereas in LAN, to 
open urgent TV Conference, users may need to run programs to do so. Therefore, it is 
getting crucial for researchers of active network to find out to what extent the archi- 
tecture can provide flexibility, performance and security. 



4 Application of Active Network Architecture to Policy 
Networking 

4.1 Overview 

To solve the issues of current policy networking raised in section 2, we propose to use 
an active packet instead of legacy protocols such as CLIs, SNMP and COPS. As the 
active packet is a program, it can decide what policy should be applied to the node in 
which the program is executed. The concept of this solution is depicted in Fig. 2. The 
active packet generated in a Policy Server is executed in every node. Execution of the 
packet itself means setting policies. Therefore, this characteristic makes the active 
packet itself behave as a Policy Consumer. 



' JVM is a registered trademark of Sun Microsystems, Inc. 
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Policy Management Tool 



Active Program 




Fig. 2 Policy Networking with New Concept 



4.2 Architecture 

Fig. 3 depicts the overall of the new architecture. The architecture is embedded in 
both managers and nodes. As this figure shows, we implemented the manager in our 
management system called IP-Edge manager and the node called IP -Edge. This ar- 
chitecture runs on JVM and the programmable packet is made of Java byte code. The 
architecture consists of the following modules. 

1 . GUI Components 

The GUI components give users GUI for Policy operation. They behave as a Policy 
Management Tool. These components are implemented only in the manager. 

2. Active Program Execution System (APES) 

This module provides an environment for executing and controlling programmable 
packets. 

3. Naming Service and Event Service 

In this architecture, each object is provided using CORBA. Naming Service and 
Event Service are made possible by using CORBA GOSS. This portion is out of 
scope of this architecture, meaning any pair of naming service and event service 
may be applicable. 
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Fig. 3 Architecture 



4.3 GUI Components 

GUI Components consist of a GUI module and a logic module. The logic module 
provides a template of the programmable packet. The template is made in the form of 
JavaBeans^. The GUI module provides the properties of the JavaBeans to the user. 
Therefore, users are able to customize a policy by editing these properties. 

Program Storage in APES allows an administrator to store and reuse programmable 
packets. Service Modules are easily installed via Service Manager, which contacts the 
manager and downloads newly updated Service Modules. 



4.4 Active Program Execution System(APES) 

Active Program Execution System (APES) is a unique component that controls exe- 
cution and monitoring of programmable packets. APES is installed in both manager 
and node, although their behavior slightly differs. APES consists of the following 
objects: 



^ JavaBeans is a registered trademark of Sun Microsystems, Inc. 
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1 . External Interface 

External Interfaces are used to invoke functions of APES. As for APES in the man- 
ager, they are used by GUI components to create, delete, store, and run program- 
mable packets, while they are used as a means to send/receive programmable pack- 
ets as for APES in the node. 

2. Program Storage 

Program Storage stores programmable packets persistently for reusing them. This 
reduces administrator’s overhead of creating programmable packets. This object 
only resides in the manager. 

3. Program State Manager 

Program State Manager traces each programmable packet’s behavior, providing up- 
to-date information of the programmable packet( location, status etc. ). This object 
only resides in the manager. 

4. Program Execution Module 

Program Execution Module executes programmable packets, invoking Fundamen- 
tal Libraries and Service Modules. 

5. Fundamental Libraries 

Fundamental Libraries are objects that consist of basic functions such as “ move 
forward to other node” or “ stay there for a desired period of time “. 

6. Service Manager 

Service Manager downloads newly updated Service Modules from the manager 
whenever needed. This gives more flexibility in providing network services to 
nodes. Moreover, this assures consistency in a network domain. Service Manager 
only resides in the node. 

7. Service Modules 

Service Modules are services themselves. Each Service Module consists of network 
services such as configuration of DiffServ, MPLS, routing and so on. Service Mod- 
ule only resides in the node. 



4.5 Behavior 

Fig. 4 shows the brief procedure of the architecture. Firstly, an administrator edits a 
policy using GUI components and generates a programmable packet. The program- 
mable packet is sent to Active Packet Execution System(APES) via External I/F and 
stored in Program Storage. Once the administrator executes the policy, the program- 
mable packet in Program Storage is invoked and executed in APES. While the pro- 
grammable packet is executed. Fundamental Libraries and Service Modules are in- 
voked if needed. 
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5 Advantages 

5.1 Customizability 

Existing communication protocols between Policy Consumer and Policy Target are 
SNMP, CLI, or COPS etc. Supposedly, other novel protocols can appear. Therefore, 
Policy Consumer has to modify itself due to protocol version ups and new protocol 
appearances, which dramatically increases development costs. Moreover, in order to 
cope with every protocol, the service itself might be reduced. 

For example, even a slightly complicated policy rule such as “If there’s no traffic 
of president-hosted TV conference, get the QOS for Tom” might appear in some oc- 
casions and it is very hard to modify protocols itself to meet such needs. 

With programmable packets, by modifying program in Policy Consumer, the desir- 
able service can be obtained without modifying the protocols themselves. 
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5.2 Reducing Traffic 

Communications between Policy Target and Policy Consumer often generate bursty 
traffic because every network element in a domain should communicate with Policy 
Consumer. On the other hand, by using programmable packets, this problem can easily 
be solved. Programmable packets autonomously route their own way and traverse all 
nodes they should control. This behavior dramatically reduces control traffic and 
overhead of Policy Consumer, which had to cope with every Policy Targets. As shown 
in. Fig 2 the number of connection between Policy Consumer and Policy Target is 
only two instead of five, the number of network elements. 



6 Future Works and Future Directions of Policy Networking 

As a policy management system increases its function, it is assumed that the system 
will behave in a more distributed manner using middleware such as CORBA or RMI. 
However, such distributed system might not be able to cope with a tremendous num- 
ber of network elements and users. Even if it may be possible, the system should be 
enlarged in proportion to the increase in the number of network elements or users in 
order to manage them. 

To avoid such enlargement, it is mandatory that network elements themselves 
should be more “intelligent”. 

Our approach described in this paper is moving toward adding intelligence in net- 
work element. In the immediate future we will be getting results out of our prototypes. 
It is certain that this framework is applicable to the whole network management sys- 
tem for the following reasons: 

1. The architecture reduces control traffic (SNMP packets etc.), from which a network 
management system is also suffered. 

2. GUI components described above can be applied to the customizable EMS compo- 
nents and NMS components by adding service modules. 

However, there still exists one major problem for applying the system to 
NMS/EMS. That is, the way to interact with existing systems. It might be achieved by 
adding a proxy between this new system and existing one but the existing technology 
might not provide the same service to existing system as the ones in this new archi- 
tecture. This is the case with this policy application. Therefore, our future work is to 
establish the way to solve this migration problem and the result of the future work will 
help deploy both policy applications and NMS/EMS applications using this architec- 
ture. 



7 Conclusion 

The paper has described the new approach to policy networking, which utilizes active 
networking technology. Along with describing the system itself, the paper has shown 
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two advantages of this approach: 1) Much more customizability than existing policy 
networking architecture 2) Less control traffic. Finally, the future direction of the 
policy networking has been discussed . We are sure our approach is moving toward 
such direction. 
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Abstract. As a solution to the lack of scalability of the centralized 
management paradigm, the distributed management paradigm has been 
introduced. In the paradigm, on the basis of active nodes in active net- 
works, management scripts are downloaded to and executed on managed 
systems. The paradigm, however, contains a serious and unsolved issue 
that ill-balanced of system load over a network is caused by unexpected 
amount of the management scripts added on the systems’ own tasks. 
Therefore, considering network resource utilization, how to distribute 
and place the management scripts in an appropriate manner is a signifi- 
cant issue to be studied in order to hold the balance of the management 
tasks in the overall network. For a solution to the issue, we propose a 
new dynamic load-balancing algorithm for distributed management in 
active networks. The decision on which systems to execute the manage- 
ment scripts is dynamically made on the basis of the deviation from the 
average CPU utilization of all the systems and the bandwidth needed 
for executing all the management scripts. We theoretically show how to 
find optimal values for a tolerable deviation and a maximum tolerable 
bandwidth for the better load-balancing. We evaluate the proposed algo- 
rithm by applying it to an operational LAN. The results show that the 
proposed algorithm performs so well with a trivial overhead that it can 
hold the balance among management tasks in an overall network. The 
proposed algorithm could be one of the essential techniques enabling the 
distributed management paradigm more promising. 



1 Introduction 

The current network management such as SNMP (Simple Network Manage- 
ment Protocol) [1] and TMN (Telecommunications Management Network) [2] 
follows the centralized management paradigm. The centralized management 
systems perform management tasks, like monitoring, analyzing and control- 
ling managed systems. It is addressed that scalability is a major issue of the 
paradigm[3,4,5,6,7,8,9,10,ll,12]. An increase of the number of managed systems 
results in much bandwidth consumption for polling. 

The distributed management paradigm has been proposed to solve the above 
issue on the basis of active nodes in active networks[13,14,15,16]. A typical 
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method is called the Management by Delegation (MbD)[3,4,5,6,7], in which 
management scripts are downloaded to and executed on managed systems like 
routers and switches. Another method is the management by mobile software 
agents[8,9,10,llil2], in which mobile software agents with some management 
tasks migrate from/to management and managed systems. We do not distin- 
guish these methods hereafter, since they have a similar assumption in that the 
programs can be downloaded to and executed on the managed systems. 

Despite a number of study efforts, many of them only put an emphasis on 
the advantages of the distributed management paradigm against the central- 
ized one. They focus on enhanced scalability by reducing processing load of the 
management systems and by saving the bandwidth for polling. However, as the 
further remarkable study issue, load-balancing including management tasks in 
an overall network should be significantly taken into considerations. For exam- 
ple, if the management scripts are downloaded regardless of systems’ processing 
load, heavy loaded systems such as border routers may not effectively process 
the systems’ own tasks as well as the management scripts. This results in an un- 
desirable situation from the viewpoint of effective distributed processing, which 
is the strength of the distributed management paradigm. 

We propose a new dynamic load-balancing algorithm for distributed man- 
agement in active networks. It allows us to hold the balance among management 
tasks in the overall network by taking account of; 1) the processing load of the 
management and managed systems, and 2) the bandwidth needed for executing 
all management scripts. The technique used here is to control the differences 
in the average CPU utilization among these systems within a specified small 
value, referred to as the tolerable deviation hereafter, and to limit the band- 
width needed to execute management scripts within another specified value. 

We implement the proposed algorithm and evaluate its effectiveness from 
the following two view points, by applying the implementation to an operational 
LAN; 1) how well the proposed algorithm can perform load-balancing by making 
use of an optimal value of the tolerable deviation, including the overhead of the 
proposed algorithm, and 2) how to derive the optimal value of the tolerable devi- 
ation effective in practical operation, by applying a newly introduced theoretical 
expression to the processing load of typical management scripts measured in the 
LAN. 

The paper is organized as follows. In Section 2, we present an overview of 
the distributed management paradigm and address its issue with respect to load- 
balancing. For a solution to the issue, in Section 3, we propose a new dynamic 
load-balancing algorithm, and in Section 4, we discuss the implementation of the 
proposed algorithm. In Section 5, we evaluate the effectiveness of the proposed 
algorithm by applying the implementation to an operation network. 



328 



Kiyohito Yoshihara et al. 



Management 






system 


r 


' 1 1 


□ 




, — ^ 






Management scripts describing 
a part of management tasks 



f(d) 

Notification on results of aggre- ' 
gating, processing, and analyzing! , I 
management information 




/^xDelegating management tasks by 
^ downloading managemet scripts 



Aggregating, processing, 
and analyzing manage- (c) 

ment infonnation 



a 



MIB — - 

(Active) Router (Active) Switch 



Managed systems (Active nodes) 

Fig. 1. Overview of Distributed Management Paradigm (MbD) 



2 Overview of Distributed Management Paradigm and 
Its Issue 

2.1 Overview of Distributed Management Paradigm 

A management system downloads a management script describing a part of man- 
agement tasks and delegates the task to a managed system (an active node) as 
shown in Figure 1 (a). Some typical management scripts collect management 
information from MIB (Management Information Base) by means of internal 
polling closed to the managed system (Figure 1 (b)). Others aggregate, process, 
and analyze the collected information (Figure I (c)), send notification to the 
management system (Figure 1 (d)), or control the managed systems according 
to the aggregated, processed, and analyzed results of the collected information 
with the given management policy (Figure 1 (e)). This paradigm could solve the 
drawbacks of the centralized management paradigm, one of which is improve- 
ment of the scalability by reducing processing load of management systems and 
saving the bandwidth for polling. 

2.2 Issue towards Promising Distributed Management Paradigm 

Despite a number of study efforts, many of them only put an emphasis on the ad- 
vantages of the distributed management paradigm against the centralized one, fo- 
cusing on enhanced scalability, fault-tolerance and precise monitoring for health 
function[I7]. In the distributed management paradigm, however, a heavy pro- 
cessing load may be imposed on the specific managed systems and the paradigm 
cannot necessarily hold the balance among management tasks in an overall net- 
work, since the management scripts are downloaded and executed regardless of 
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Fig. 2. Imbalanced Situation in Distributed Management Paradigm 



the managed systems’ processing load. For example, as shown in Figure 2, all 
the more management scripts may be downloaded to and executed on the border 
(active) router A on a foreign LAN2 and B on a WAN, because the inter-network 
traffic aggregated at the routers potentially congests them and thus they are key 
components for management purpose. This results in performance degradation 
of their own processing snch as packet forwarding due to a heavy processing load 
by execntion of the management scripts. 

Therefore, load-balancing for management tasks within the overall network 
on the basis of network resource utilization could be a new significant issue to 
be solved towards more promising distributed management paradigm. 



3 Proposed Algorithm 

3.1 Principles for Load-Balancing 

As we described in Section 2.1, the distributed management paradigm can cer- 
tainly save the bandwidth for polling, whereas there still remains a possibility to 
impose a heavy processing load on specific managed systems. With respect to the 
bandwidth for polling and processing load of management tasks, the distributed 
management paradigm is complementary to the centralized one in which man- 
agement tasks are concentrated on a management system so that processing load 
of the management tasks on managed systems may be minimized. 

In our proposal, we coexist the centralized and distributed management 
paradigms. The proposed algorithm holds the balance among management tasks 
in an overall network based on 1) the processing load of the management and 
managed systems and 2) the bandwidth needed for executing all management 
scripts. That is, as shown in Figure 3, management scripts are executed not 
only on the direct target managed systems to be monitored or controlled, but 
also on less loaded other systems including management systems, proxies, and 
mediation systems. For example, when the management scripts X and Y for 
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monitoring are executed on a non-direct target managed system, the router C, 
for load-balancing, these remote management scripts monitor their direct tar- 
get managed systems, the routers A and B, by means of polling, similarly to 
a management system in the centralized management paradigm (Figure 3 (1) 
and (2)). For the purpose of such load-balancing, the management system moves 
the management scripts from the heavy loaded systems A and B to the less 
loaded destination system C (Figure 3 (3)), and executes them again there. 

By the proposed algorithm, the processing load of all the systems can be bal- 
anced and then the performance of their own processing such as packet forward- 
ing is normalized in the overall network as shown in the right side of Figure 3. 

3.2 Algorithm Description 

In order to achieve the load-balancing based on the principle above and adapt- 
able to network resource utilization, the management system dynamically deter- 
mines a less loaded destination system on which a management script should be 
executed, by making use of the criteria as defined below. 



Giving Criteria for Load-Balancing We have the following two criteria 
concerning network resource utilization for the proposed algorithm. 

1. Deviation from average CPU utilization 

Let Ci^T {%) be the average CPU utilization of the system i {1 < i < N) 
over T seconds, where N denotes the number of management and managed 
systems. Also, as given by Equation (1), let 6i^r be the deviation of cqr from 
the average value of (1 < j < N) over all the systems. The proposed 
algorithm makes use of (iqr as one of the criteria, and derives the value of 
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^i,T by means of CPU monitoring management script as described later in 
this section. Note that, for a multi-CPU system, the average over all the 
CPUs’ utilization represents its CPU utilization. 



2. Bandwidth needed for executing all management scripts 

This is derived manually or automatically. For example, if a management 
script makes use of the specific protocol like SNMP, then the bandwidth can 
approximately be derived from the length of PDU (Protocol Data Unit), 
including the number of pieces of management information, their syntax 
and assumed values, and the polling interval. 

If the algorithm cannot determine a management script to be moved for load- 
balancing only by the above criteria, then it makes use of the priority assigned to 
each management script by a network operator as described later in this section. 

Determining Management Script to Be Moved and Its Destination 
System We introduce two thresholds for load-balancing. One is the tolerable 
deviation A (%) and another is the maximum tolerable bandwidth B (bps). The 
proposed algorithm tries to bound all within A and also the bandwidth 

needed for executing all management scripts within B. They are specified by a 
network operator and see Section 3.3 on how to set the values of these thresholds 
for effective load-balancing. 

As candidates for load-balancing, the proposed algorithm first selects man- 
agement scripts on the system such that Si^T > A and is the maximum 
of all the systems. Next, the proposed algorithm determines the system having 
the minimum Ci^T as the destination system, and from the candidates above, 
determines the management script as a target for load-balancing such that the 
decrease in bandwidth needed for executing all the management scripts after 
moving towards the destination system is maximized (if there is no such man- 
agement script, it determines the one such that the increase is minimized). 

If any move of a management script results in the bandwidth excess over 
the maximum tolerable bandwidth B, then the management system notifies to 
a network operator and terminates the execution of the management script with 
the lowest priority. 

Performing Load-Balancing and Managing Information for Load-Bal- 
ancing The management system moves and downloads the management script 
to the destination system according to the determination as described before and 
updates a set of information required for load-balancing such as System table, 
Management script tables, and the bandwidth consumption described as follows. 

As shown in Figure 4, the management system maintains all the information 
required for load-balancing including System table (Figure 4 (a)), Management 
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(a)System table 



System 


Average CPU utilization(c.j.) 


Deviation! 6 . j) 


Management system 


46% 


-4% 


Managed system 1 


78% 


28% 


Managed system 2 


53% 


3% 








Managed system N-\ 


52% 


2% 


Average 


50% 





(b)Management script tables 

Management system Managed system 1 Managed system n 




ID of mgmt. script 


Band 

Manaaement system 


width 

Managed system 


Target managed system!*) 


Priority 


script A 


864bps 


346bps 


Managed system 1 


5 


script B 


346bps 


0bps<**) 


Managed system 1 


2 


script C 


172bps 


Mbps 


Managed system 2 


7 













(*) Seript's direct target managed system for a management task 
(**)Bandwidth is supposed to be 0 if a management script notifies only in case of a 



threshold violation, for example. 



Tolerable deviation( /I ) 



10 % 



Max. tolerable BW(S)<***) 


1.0*10Tps 


Bandwidth consumption 


3.2*10%ps 



(***)Maximum tolerable bandwidth (B) 



Fig. 4. Example of Information for Load-balancing 



script tables (Figure 4 (b)), Tolerable deviation (Z\), Maximum tolerable band- 
width (B), and the bandwidth consumption. 

System table retains the average CPU utilization for each system and derives 
the deviation y from the utilization. In order to monitor the CPU utilization, 
the management system downloads a CPU monitoring management script, re- 
ferred to as CPU monitoring script hereafter, to each management and managed 
system in advance and sets Ci^r into System table based on the monitored values. 
Management script table is provided for each system and retains information on 
the management scripts executed on the system. It includes the identifier of a 
management script executed on the system, the bandwidth needed for the exe- 
cution on both the management and managed systems, and the script’s direct 
target managed system for a management task. The bandwidth consumption is 
summed up by the bandwidth consumed by all the management scripts executed, 
retained by Management script tables. 
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3.3 Deriving Tolerable Deviation A and Maximum Tolerable 
Bandwidth B 

Tolerable Deviation A The smaller the value of the tolerable deviation A 
is, the less the difference in the average CPU utilization between any two sys- 
tems Ci,T and Cj^T (1 < L j < ^ j), and then the algorithm can hold the 

balance among management tasks in an overall network more closely. However, 
if the value of A is set too small, then even the little difference in the average 
CPU utilization between some two systems may trigger an adjustment of load- 
balancing. This may cause an undesirable effect that the algorithm continues 
trying to move and download a management script from a system to another. 
The optimal value of A preventing this effect is the minimum one which satisfies 
the following two conditions simultaneously. 

Condition 1 The value of A is greater than the maximum processing load of 
all the management scripts for all the systems. 

Condition 2 The value of A is greater than the maximum processing load of all 



For the rationale of Condition 1, we assume a case in which only a single 
management script is executed on a system in an overall network and the average 
CPU utilization of all the other systems is 0%. Unless Condition 1 holds in 
this case, the execution of the management script on any systems triggers an 
adjustment of load-balancing and then causes the above effect. We can prove 
that A satisfying Condition 1 for the above case can also prevent the effect in 
any other cases, and is exactly given by Equation (2), where N and Cmax denote 
the number of systems and the maximum processing load of all the management 
scripts for all the systems, respectively. 



Unless Condition 2 holds, the increasing processing load of the destination 
system by downloading a management script for load-balancing again triggers 
an adjustment of load-balancing and this also causes the above effect. 

Maximum Tolerable Bandwidth B Since user communication traffic and 
management traffic share the bandwidth in a network like LAN and WAN, it is 
generally desirable that the bandwidth for management traffic be restricted at 
most 5% of the minimum bandwidth of the network [1]. According to this, for 
example, B is set to 100 Kbps, which is 5% of the effective bandwidth 2 Mbps 
in 10Mbps Ethernet. 

3.4 Examples 

We show how the proposed algorithm holds the balance of management tasks 
with the flowchart shown in Figure 5, by taking information required for load- 
balancing in Figure 4 as an example. 



the destination systems when a management script is downloaded 
there. 




( 2 ) 
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Fig. 5. Flowchart of Proposed Algorithm for Load-balancing 



The tolerable deviation A and the maximum tolerable bandwidth B are set 
10% and 1.0 * 10® bps, respectively (FigureS SI). The CPU monitoring scripts 
are downloaded to each system (S2). Then the algorithm configures System 
table and Management script tables (S3)iiCand downloads management scripts 
to the direct target systems and executes them (S4)iiDThe algorithm tries to 
perform load-balancing since Si^ (= 28%) > A (= 10%) (S5). As candidates, 
the algorithm selects management scripts on the system 1 with the maximum 
average CPU utilization (S6). The algorithm determines the management system 
having the minimum average CPU utilization as a destination system (S7) and 
the management scripts C as a target for load-balancing, causing the maximum 
decrease in the bandwidth consumption, 14 bps (from 172 -|- 14 bps to 172 
bps) (S8). Since the bandwidth consumption does not still excess the maximum 
tolerable bandwidth B (S9), the algorithm performs load-balancing by moving 
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Fig. 6. Implemented Software Configuration 



the management script C from the system 1 to the management system (SIO). 
Finally, the algorithm updates System table, Management script tables, and the 
bandwidth consumption (Sll), and returns the process to S5. 



4 Implementation 

We implement the proposed algorithm by JDK v.1.2 [18]. Currently, only a min- 
imum function enough for evaluating the proposed algorithm is implemented. 
Figure 6 shows the implemented software configuration. The software runs on a 
IBM PC/AT compatible machine, which emulates an active node, that is, the 
managed system with enough programmability to download and execute a man- 
agement script. We make use of management information defined in RFC1213 
MIB-II [19], the legacy SNMP agent, and AdventNet, Inc. SNMPvS Package2.2 
for Java API to the SNMP agent. We give IP addresses and the MIB defini- 
tion for initial settings. There may potentially be many varieties of management 
scripts with different management tasks. We implement two typical management 
scripts effective in the distributed management paradigm [7] in addition to the 
CPU monitoring script as shown in Table 1. 

These management scripts can be flexibly customized by specifying a few pa- 
rameters (Figure 6 (1)). The software then approximates the bandwidth needed 
for executing each management script and registers them with Management 
script tables shown in Figure 4. The management script is downloaded to a 
managed system through Java serialize interface (Figure 6 (2)). The down- 
load/unload, and start, suspend, and resume of the execution of the manage- 
ment script are materialized by Java RMI (Remote Method Invocation). When 
the execution starts (Figure 6 (3)), it performs polling to the legacy SNMP 
agent according to the specified parameters (Figure 6 (4)). The threshold viola- 
tions and aggregated management information are also notified and sent by RMI 
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Table 1. Management Scripts in Implementation 



Management 

script 


Description 


Parameters (example) 


A 


Notifies threshold violation to 
management system on result of 
polling 


Direct target managed sys- 
tem, Names of management in- 
formation for polling, polling 
interval^*\ upper and lower 
thresholds'*^ 


B 


Sends aggregated management 
information collected by polling 
for every specified time to man- 
agement system 


Direct target managed system. 
Names of management infor- 
mation for collection, polling 
interval^ *\ sending interval 


CPU monitoring 
script 


Sends average CPU utilization 
collected by polling for every 
specified time to management 
system 


Direct target managed system, 
polling interval^ *\ sending in- 
terval 



(*) Possible to specify a different value for each piece of management information. 



(Figure 6 (3)). Since the CPU utilization is not defined in MIB-II, it is collected 
through OS-dependent API (Figure 6 (5)) and sent by RMI (Figure 6 (6)). If 
a heavy processing load on a system is detected, the execution of a manage- 
ment script is suspended, and the management script is downloaded to another 
less loaded system in a serialized form with its part of execution context (Fig- 
ure 6 (7)). 

5 Evaluation 

By applying the implementation to an operational LAN, we evaluate how well 
the proposed algorithm can hold the balance among management tasks in the 
overall network with a trivial overhead. The specifications of the PCs in the 
evaluation is shown in Table 2. We set the parameter values of the management 
scripts as in Table 3. All the PCs are connected to the same LAN (10Mbps 
Ethernet) and the maximum tolerable bandwidth B is set to 100 Kbps. The 
proposed algorithm tries to perform load-balancing for every 100 seconds. 



5.1 Load-Balancing by Proposed Algorithm in Operational LAN 

We impose a heavy load on System 2 by executing 20 management scripts each 
polling 1 object type (a piece of management information) for every 5 seconds 
and also adding 25% stationary processing load on the system. We measure the 
first time when there is no move of a management script for load-balancing by 
the proposed algorithm for the last 10 minutes, referred to as the convergence 
time hereafter, and the maximum deviation of the average CPU utilization at 
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Table 2. PC Specifications^*^ in Evaluation 



Name CPU 


CPU 

frequency (Mhz) 


Memory 

(Mbyte) 


Remark 


System 1 Intel Pentium 


133 


32 


Managed system 


System 2 Intel Pentium 


150 


96 


Managed system 


System 3 Intel Pentium II 


266 


64 


Managed system 


System 4 Intel Pentium II 


450 


64 


Managed system 


System 5 Intel Pentium II 


450 


128 


Management system 



(*)System OS is WindowsNTL.O SP5. 



Table 3. Parameter Values of Management Scripts in Evaluation 



Management script A 

Management informa- Object types of counter syntax in 

tion for polling MIB-II ifEntry 

Number of management 1, 5, 10, 15, and 20 
information for polling 

Polling interval^*^ 5, 10, 15, 30, 60, and 180 seconds 

Upper threshold 500 

Lower threshold 0 

Priority Common to all management scripts 

CPU monitoring script 

Polling interval 1 second 

Sending interval 100 seconds 

(= 1 =) Common polling interval in case more than one object type is specified. 



the convergence time, for some different values of A. In this evaluation, the man- 
agement system moves a management script for every 100 seconds in sequence 
and it is optimally desirable that all the 20 management scripts be distributed 
to other systems. It hence takes at least 2000 seconds (= 20 x 100) for any value 
of A, which is depicted as the white bar for comparison in Figure 7. The optimal 
value of A preventing the effect described in Section 3.3 is 21.5% and see how 
this value is derived in Section 5.2. 

As shown in Figure 7, the convergence time becomes shorter as the value 
of A is greater, whereas the maximum deviation of the average CPU utiliza- 
tion becomes larger from the value 22% (21.5% is rounded up to 22% due to 
implementation limitation) to 30%. In this evaluation, when the value of A is 
10% smaller than the value of 22%, we could not measure the convergence time 
within 90 minutes (= 5400 seconds) in the worst case, which is twice as the 
convergence time when the value of A is 22%. By all the results above, setting 
the value of Z\ = 22% can control the differences in the average CPU utilization 
among the systems within a small value of 12.7% as shown in Figure 7, and thus 
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Fig. 7. Convergence Time and Maximum Deviation of Average CPU Utilization 



the proposed algorithm can hold the balance among management tasks in the 
overall network. 

Furthermore, we evaluate the overhead of the proposed algorithm from the 
three view points of the bandwidth consumption, the processing load of man- 
agement and managed systems. 



Bandwidth Consumption CPU monitoring script has a little bandwidth con- 
sumption for sending a CPU utilization message. The message is one-way and 
the size of the message is 60 bytes. It is smaller than the size of a minimum 
SNMP PDU of approximately 90 bytes, and 180 bytes in case of a request and 
a response. The management system generally performs polling a few tens of 
object types and consumes more bandwidth in practical operation. This little 
bandwidth consumption never takes over the drawback of much bandwidth con- 
sumption as in the centralized paradigm. 



Processing Load of Management System The overhead of load-balancing 
by the proposed algorithm on the management system is for 1.4 seconds with the 
maximum momentary processing load 16.2%. This is approximately equivalent 
to that of a single polling for 12 object types from the management system in 
the implementation. As stated above, the management system performs polling 
a few tens of object types for such as monitoring. This implies that the proposed 
algorithm does not spoil the advantage of the distributed management paradigm 
against the centralized one. 



Processing Load of Managed System The processing load of CPU mon- 
itoring scripts is 2.3%, 2.1%, 0.9%, and 0.5%, respectively, when executed on 
System 1, 2, 3, and 4. Each of them is less than or equal to the processing load 
of the management script A polling a single object type for every 5 seconds when 
executed on the same systems as shown in Figure 8 (a). This brings hardly any 
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impacts on the managed system in practical. On the other hand, as we have seen, 
managed system can get benefit of load-balancing from the proposed algorithm. 
As these results, the overhead of the proposed algorithm is trivial. 



5.2 Value of Tolerable Deviation A Effective in Practical Operation 

At first, we derive the minimum value of the tolerable deviation A satisfying 
Condition 1 in Section 3.3. When the management script A polling 20 object 
types for every 5 seconds, referred to as A 2 o ,5 hereafter, is executed on System 1 
with the lowest processing capability of all the systems, the processing load is 
65.0%, and this implies that Cmax = 65.0%. By Equation 2, Z\ > 65.0 x (1 — 1/5) 
= 52.0(%). This value is obviously too greater to hold the balance of management 
tasks in the overall network. 

We can get a smaller value of A by executing finer grained management 
scripts, that is, 20 management scripts each polling 1 object type for every 5 
seconds, each referred to as Ai^s hereafter, which together performs an equiva- 
lent management task to that of A 2 o,s. It allows a smaller value of A enabling 
the closer load-balancing in the overall network. Figure 8 (b) shows the pro- 
cessing load of the management script A 2 o,s and 20 Aids’s when executed on 
System 1, 2, 3, and 4. On every system, there is scarcely any difference between 
the both executions. This also holds for any other values of the number of object 
types for polling and the polling interval in the implementation. By executing 
the finer grained management scripts, that is, 20 Aids’s on System 1, Cmax is 
4.2% as indicated by the point X shown in Figure 8 (b) and the value of A can 
be reduced to 4.2 x (1 — 1/5) = 3.36%. 

As described above, it is significant to execute a finer grained management 
script to reduce the value of A with respect to the closer load-balancing. 

Next, we derive the minimum value of the tolerable deviation A satisfying 
Condition 2 in Section 3.3. ft takes 18.7, 21.5, 6.5, and 4.1 seconds on average to 
download the management script A 2 o ,5 to each System 1, 2, 3, and 4. This time 
is almost the same for any other values of the number of object types for polling 
and the polling interval in the implementation. For every downloading, the CPU 
utilization marks 100% all the times and this implies that these download times 
corresponds to the processing load per 100 seconds. The maximum processing 
load of all the destination systems when a management script is downloaded is 
21.5% and A > 21.5% when T is 100 seconds. In the current implementation, the 
tree structural management information causes a relatively long-time dynamic 
download. This is a major factor for determining the value of A. We are now 
tuning up the implementation and expect that this would alleviate the processing 
load in Condition 2. This will result in a smaller value of A and we could perform 
the closer load-balancing for the shorter T. 

Hence, the minimum value of A satisfying both Condition 1 and 2 simulta- 
neously is 21.5%. 

From all the discussion above, with a trivial overhead, the proposed algorithm 
can hold the balance of the management tasks and will prevent systems’ perfor- 
mance degradation. The proposed algorithm is expected to be one of the essential 
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techniques in order to make the distributed management further promising in 
an operational network. 

5.3 Discussion on Delay in Monitoring CPU Utilization 

It is theoretically shown that the large delay in monitoring CPU utilization, for 
example, by way of a WAN, might cause the inconsistency of the actual and 
the monitored values, and there may be the case where dynamic load-balancing 
algorithms including the proposed one does not behave as good as desired [20]. 
In our context, this could be solved by providing the proposed algorithm in the 
form of a management script and by distributing the management script to each 
managed network, for example to a LAN, in which the proposed algorithm never 
suffers from the delay. 

The case above is a common issue to all dynamic load-balancing algorithms 
and showing how sensitive to the monitoring delay in an operational network is 
left for future study. 



6 Conclusions 

In this paper, we proposed a new dynamic load-balancing algorithm for dis- 
tributed management in active networks. By coexisting the centralized and dis- 
tributed management paradigms, the proposed algorithm holds the balance of 
management tasks in the overall network on the basis of 1) the processing load of 
management and managed systems, and 2) the bandwidth needed for executing 
all management scripts. The proposed algorithm can control the differences in 
the average CPU utilization among these systems within a small value (the tol- 
erable deviation) specified by a network operator, as well as limit the bandwidth 
needed to execute all management scripts. 
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We implemented the proposed algorithm and evaluated its effectiveness from 
the following two view points, by applying the implementation to an operational 
LAN; 1) how well the proposed algorithm can perform closer load-balancing by 
making use of an optimal value of the tolerable deviation, including the over- 
head of the proposed algorithm, and 2) how to derive the optimal value of the 
tolerable deviation effective in practical operation, by applying a newly intro- 
duced theoretical expression to the processing load of typical management scripts 
measured in the LAN. The results showed that; 1) the algorithm could achieve 
load-balancing closely when specified the derived value, and 2) the overhead of 
the proposed algorithm scarcely brought any impacts on the management and 
managed systems and the network. 

With respect to load-balancing for distributed network management, we ob- 
tained the following two in the evaluations; 1) it is significant that, for preventing 
the continual moves of management scripts triggered by the little difference in 
the average CPU utilization between some two systems, the value of the tolera- 
ble deviation should be smaller than both the maximum processing load of all 
the management scripts for all the systems and the maximum processing load of 
the destination system when a management script is moved there, and 2) It is 
effective to execute finer grained management scripts, for example, polling the 
smaller number pieces of management information. 

The proposed algorithm is expected to be one of the essential techniques 
enabling the distributed management paradigm more promising and feasible. 
The tune up of the current implementation for alleviating the long-time dynamic 
download and the evaluations with many varieties of management scripts in a 
wide area network are left for further study. 

We are well indebted to Dr. Takuro Muratani, President & CEO of KDD 
R&D Laboratories Inc., Dr. Shigeyuki Akiba and Mr. Tohru Asami, Executive 
Vice Presidents, for their continuous encouragement to this research. 
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Abstract. The increased complexity of the service model relative to 
store-and-forward routers has made resource management one of the 
paramount concerns in active networking research and engineering. In 
this paper, we address two major challenges in scaling resource manage- 
ment to many-node active networks. The first is the use of market mech- 
anisms and trading amongst nodes and programs with varying degrees of 
competition and cooperation to provide a scalable approach to managing 
active network resources. The second is the use of a trust-management 
architecture to ensure that the participants in the resource management 
marketplace have a policy-driven “rule of law” in which marketplace de- 
cisions can be made and relied upon. We have used lottery scheduling 
and the Keynote trust-management system for our implementation, for 
which we provide some initial performance indications. 



1 Introduction 

Resource management is a major challenge for active networks because of their 
increased flexibility. Most processing in current networks is simple packet for- 
warding, meaning that most packets require the small fixed cost of a routing table 
lookup and a copy. In the grand vision of active networking, the network provides 
services that are customizable by its users. This means that packet processing is 
much more complicated because it is, at least in part, user-specified. Therefore, 
and in the absence of a more sophisticated resource management model, users 
have the potential to unfairly consume shared resources. Furthermore, there is 
no way for users to place demands on the quality {e.g., performance) of these 
services. The need for a resource management infrastructure raises four ques- 
tions: 

1. What resources should be managed? 

2. How are management policies specified? 

3. How are policies enforced? 

4. What policies should be used? 

Questions 1, 2 and 3 are well studied. Most researchers agree that an effective 
approach should focus on controlling the physical resources of the network: node 
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CPU time, memory , disk space and network bandwidth. Some work has also 
been done in policy specification in general [3] and without concrete demon- 
stration for Active Networks [20]. Finally, some projects have examined tech- 
niques for enforcing resource usage, including namespace management [1, 12], 
runtime access control [7], limited expressibility [11, 17], certification [24], and 
fine-grained resource accounting [14, 6]. 

We believe that the central outstanding question in effective resource man- 
agement is question 4, the specification of scalable policies. In this paper, we 
present a solution to this problem, consisting of two components. At the policy 
level, we define a distributed, market-based policy for resource allocation. This 
is in sharp contrast to the more administrative-based policies proposed to date; 
these policies inhibit interaction throughout the network because they are fun- 
damentally local and proprietary. Instead, a market-based approach potentially 
‘opens up’ the entire network, transforming it into an open service market. At 
the mechanism level, we integrate KeyNote [3], a distributed trust-management 
system^ into the architecture of active elements. KeyNote is roughly similar to 
Java’s SecurityManager [7], in that it is an entity that authorizes decisions of 
policy, but differs in that policy components may be delegated and distributed 
as credentials, thereby facilitating greater scalability. KeyNote serves to specify 
and uphold the ‘rule of law’ that governs market interactions. 

Market-based policies for resource management are not new; they have been 
applied to bandwidth allocation [16], memory allocation [10], and CPU schedul- 
ing [23]. Our approach was inspired by the work of Stratford and Mortier [18], 
who propose a market-based approach to QoS-oriented resource management for 
operating systems. In their work, dynamic pricing is used as a mechanism to en- 
able applications to make policy-controlled adaptation decisions. Contracts may 
be established between users and applications, and between applications and 
resource managers to apportion resources. These contracts may be established 
directly, or via third-party ‘resource traders’. We apply similar market-based 
models to the active networking context. While the areas of trust management, 
market-based systems and active networks are not new, it is their combination 
that constitutes our novel contribution. 

In the remainder of this paper, we present our design of an active network 
infrastructure that implements market-based resource management. In Section 2 
we provide an overview of our approach, elaborating on how market-based con- 
trol and trust management can materialize a scalable framework for addressing 
resource management issues in active networks. Section 3 presents the current 
state of our implementation. Some first experiments and an example applica- 
tion that expose the benefits of our work are described in Section 4. Section 5 
provides impressions, conclusions and directions for further investigation. 

2 Overview 

In order to create a scalable framework for apportioning network resources, we 
need two things: a scalable policy and a scalable way of expressing and enforcing 
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that policy. For the former, we look to specify a market-based policy] for the 
latter, we use a decentralized trust-management system to express and distribute 
that policy information to the enforcing elements. 

In this section we present an overview of our approach, leaving the details to 
the next section. We begin by explaining the merits of the market approach and 
then describe how we use it in particular. We finish by describing the benefits 
of trust management and how it applies to our system. 



2.1 Market-Based Control 

A market can be defined as ” . . . a set of interacting agents with individual goals 
that achieve a coherent global behavior ...” [-5]. Interaction takes place in the 
form of buying, selling or trading of goods. Each agent implements a strategy for 
achieving its goals through the interactions within the market. Economic theory 
suggests that through the rational behavior of the agents and the competitive 
process, the market converges to equilibrium where supply and demand match, 
and near optimal allocations of goods are achieved. In general, agents can be 
producers of goods, consumers of goods or both. The ones that act as producers 
are concerned with setting prices in order to maximize profit while consumers 
try to maximize perceived utility given specific budget constraints. This ability 
to facilitate fair allocation with very little information (just the price) makes the 
market an attractive framework for solving many complex problems. 

For our purposes, the prime advantage of a market is its lack of centralized 
control. Instead, control emerges from the individual goals and actions of the 
agents and is thus inherently distributed among the elements of the market. The 
decentralization property is crucial in the design of scalable systems: the system 
is allowed to grow and change without bottleneck. 

There are a number of other practical advantages of creating an active net- 
work economy. 

— No assumptions are made on the cooperative or competitive nature of the 
agents and their goals; there can be as many different strategies as agents, 
and each strategy may be more or less “selfish” and non-cooperative. In 
active networks, we also consider selfish and untrusted users to be able to 
execute network programs thereby consuming resources. The market-based 
approach offers flexibility in dealing with different degrees of cooperation 
and competition in the control process. 

— A market is abstract enough to capture resource control problems at any 
scale, from lower layer resources such as CPU and bandwidth to higher layer 
services such as secure connections, streams with guarantees on performance, 
etc. 

— We can build upon this infrastructure to actually charge for network services. 

We apply the market model to active network resource management. We 
define what constitutes the general terms agent, good and trade, as we have 
described above. 
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The Active Network Economy In the active network economy, the goods 
traded are the physical resources of the active nodes, e.g., CPU, memory, network 
capacity and secondary storage. Typically, the producing agents are the elements 
of the nodes, and the consuming agents are active programs that wish to run on 
the nodes and use the nodes’ elements. We also define a class of agents, called 
service brokers, for mitigating access between producers and consumers. Service 
brokers peddle resource access rights, the form of currency in our marketplace. 
A resource access right is essentially a promise of a certain sort of service, e.g., 
a periodic slice of processing time, or a fixed amount of network bandwidth. 
Resource access rights are themselves purchased by the users of active programs. 
Service brokers may manage a number of producers’ resources or may in fact be 
implemented as part of the producer. Also, a single service broker can manage 
resources from different nodes across the active network, which constitutes a 
highly valuable feature with regard to our concerns for scalability. 

In our implementation, instead of authorizing access to physical resources 
directly, we authorize access to functions that access those resources. Resource 
access rights generally specify three things: what function may be called, when it 
may be called and any restrictions on its arguments. For example, rather than 
issuing a right to some slice of CPU time, we may issue a right to set a certain 
scheduling priority. This is expressed as a right to call Thread . set_priority 
once with an argument that is less than some number. This approach also allows 
us to generalize any active service as a good in our economy, rather than just 
raw resources. On the other hand, if the underlying mechanisms are available 
to enforce access to raw resources directly, as in [14], these rights can also be 
accommodated. 

Resource access rights are part of a more general policy that governs how 
an active node may be used and by whom. Resource access rights are policy 
that is sold on the market, and can be combined into a framework that also ac- 
commodates administrative policies. In fact, administrative policy might govern 
what market policies are to be made available under what conditions. The next 
question is how to integrate these policies with mechanisms used for enforcing 
them on the active nodes. For this task we have made use of a trust-management 
system, KeyNote, which we describe next. 

2.2 Trust Management 

Since resource access rights are essentially policy rules governing the use of active 
network resources, we must establish some basic requirements for the policy 
scheme we use: 

1. Decentralization in specifying (and enforcing) policies and access control 
rules. This is preferable both to the individual entities that wish to specify 
their own policies, as well as to the entire economy, since a central point of 
enforcement for all transactions would yield a system that does not scale. 

2. Flexibility in establishing relationships between entities in the network on 
varying time-scales. For example, service brokers may change relationships 
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KeyNote-Version: 2 
Authorizer : CPU_BROKER 
Licensees: BOB 

Conditions: (an_domain == "an_exec" && moduleO=="Thread" 

&& modulel= "set_prio" kk argl < 30 
kk Sonetime == "yes") -> "ACCEPT"; 

Signature : "rsa-md5-hex: fOOf 5673" 

Fig. 1. Example credential guaranteeing a user a share of the CPU 



with various producers, and therefore the resource access rights that they sell 
should reflect the change. Consumers should be able to choose appropriate 
producers and buy from whomever they wish. 

Trust Management is a novel approach to solving the authorization and (se- 
curity) policy problem, introduced in [4] . Entities in a trust-management system 
(called “principals”) are identified by public keys, and may generate signed pol- 
icy statements (which are similar in form to public-key certificates) that further 
delegate and refine the authorization they hold. This results in an inherently de- 
centralized policy system; the system enforcing the policy needs to consider only 
the relevant policies and delegation credentials, which the user has to provide. 

In our system, resource access rights are implemented as policies initially 
authorized by the resource producer. At the outset, these policies are applicable 
to the service brokers, who may then delegate (all or part of) them to the 
consumers who purchase them. Consumers then provide the policy credentials 
to the producer when they want to access the service. 

We have chosen to use KeyNote [3] as our trust management system. KeyNote 
provides a simple notation for specifying both local policies and credentials. 
Applications communicate with a “KeyNote evaluator” that interprets KeyNote 
assertions and returns results to applications. The KeyNote evaluator accepts as 
input a set of local policy and credential assertions and a set of attributes, called 
an “action environment” that describes a proposed trusted action associated 
with a set of public keys (the requesting principals), and finally returns whether 
proposed actions are consistent with local policy. In our system, we use the 
action environment to store component-specific information (such as language 
constructs, resource bounds, etc.) and environment variables such as time of day, 
node name, etc., that are important to the policy management function. 

As an example of a KeyNote credential, Figure I shows a resource access 
right for a specific share of the CPU, as we described in the last subsection. The 
credential indicates that BOB may call Thread . set_prio once, at most, with the 
condition that the argument is less than 30. 
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Fig. 2. The ALIEN Architecture and the new components: Keynote (KN Mod- 
ule) and the resource schedulers, which are part of the Execution Environment, 
and Brokers, which are implemented as active extensions 



3 Implementation in ALIEN 

So far we have described our approach in general; in this section we describe our 
implementation, focusing on what policies we use and how they are enforced. The 
space of allowable policies is inherently tied to that which can be enforced. For 
example, a policy stating that a user may only use 10% of the CPU over a certain 
period would be totally useless without a corresponding mechanism to enforce 
that policy. In this sense, the available enforcement mechanisms establish the 
vocabulary for specifying policies. These enforcement mechanisms are themselves 
dependent on what is made available in the implementation, which we describe 
here. 

Our implementation builds on the Switch Ware project’s ALIEN [I] proto- 
type, whose three-layer archticture is shown in Figure 2. In ALIEN, properly 
authorized users may extend node processing with new code, termed active ex- 
tensions, using an API defined by the core functions. These core functions present 
an interface to node resources, essentially as an abstraction of the OS. The vis- 
ible API is controlled by the Active Loader, currently, loaded code is allowed a 
smaller API than statically linked functions enforced at load-time, for security 
purposes. For example, loaded code does not have access to the local disk. 

We have extended ALIEN to provide enforcement for our market-based poli- 
cies, in two ways. First, we have modified the Active Loader to control the visible 
API of active extensions in a more fine-grained manner. Extensions may either 
have full access to a particular core function, partial access (that is, as long as 
certain parameters are within certain limits) or no access. This decision is made 
on a per-extension basis, according to policy. Access control is enforced at load- 
time when possible, or else at run-time. Second, we have exposed some of the 
functionality of the resource schedulers into the core functionality, so they may 
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be similarly controlled by the Active Loader. Resource schedulers control low 
level functions, such as thread-scheduling, packet-scheduling, etc. 

For the remainder of this section, we describe these two extensions to ALIEN. 
Then we conclude with some of the details of how we implement Service Brokers. 



3.1 Controlling Active Extensions 

In ALIEN, active packets or extensions are received and handled by the Active 
Loader, which manages the active node execution environment. It is the Active 
Loader’s task to instantiate the code received by dynamically linking it to the 
environment and, if necessary, create a new thread for executing it. 

We extended the dynamic linking process to perform policy compliance 
checks on the module references and trigger one of three actions: accept or reject 
the reference, indicate that further run-time checking on the reference arguments 
is needed or initiate a policy-based replacement of a reference with a more ap- 
propriate one. This last feature provides a very useful technique for translating 
generic references to specific service implementations or service variants, accord- 
ing to policy. It can be used, for example, to select between different connection 
oriented communication service stacks (Secure vs. Unsecure) or choose between 
guaranteed or best effort service (by translating references to the Thread module 
to either BEThread for best-effort or GSThread for guaranteed service). 

This policy enforcement mechanism is implemented as follows. At dynamic 
link time, the linker processes the bytecode object and extracts all references 
to modules and functions that are external to the loaded object. For each such 
reference, the KeyNote evaluator is queried, along with any appropriate creden- 
tials. Each result type is associated with a specific action that has to be taken. 
The currently implemented result types are: 

— ACCEPT: the reference is accepted unconditionally. 

— REJECT: the reference is not allowed. 

— REPLACE: the reference has to be replaced by another reference; e.g., a refer- 
ence to a general packet-sending function is replaced by a rate-limited one. 

— CHECK-ARCS: there are restrictions on the arguments of the reference. 

Eor the hrst three cases, the linker can perform the check and take necessary 
actions, resulting in no run-time penalty. For the final case, checks must occur 
at run-time, since function arguments are dynamic entities. Each such reference 
is replaced by an anonymous function. This function contains instructions to 
initiate a query to the KeyNote evaluator on the validity of the original func- 
tions’ arguments. An ACCEPT/REJECT reply is expected. A REJECT reply would 
raise an Invalid_ Argument exception, while ACCEPT would result in the anony- 
mous function calling the original reference. These interactions with the policy 
management element are shown in Eigure 3. 
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Fig. 3. Interactions with the Keynote Trust Management System: Link-time, 
Run-time, and application specific interactions are depicted 



3.2 Resource Schedulers 

The task of resource schedulers is to coordinate access and control allocation 
of shared resources such as CPU, memory, storage or network bandwidth. The 
ALIEN implementation includes support for thread scheduling and memory al- 
location as part of the Active Loader. Packet scheduling can be dealt with in 
a similar fashion; however, since ALIEN is currently implemented in user-space 
and relies on the operating system for scheduling packets, we do not address 
such issues in this paper. We focus on the use of a market-based hierarchical 
thread scheduler, and its interaction with the policy management and service 
broker functions. 

Hierarchical scheduling enables the coexistence of different scheduling dis- 
ciplines to fit the various application requirements. In practice, clients of the 
scheduling process can be schedulers themselves, thus forming a scheduling tree. 
It is also possible for users to attach their own schedulers to the hierarchy in 
order to manage some share of the processing power. 

At the root of the hierarchy, we use a market-based algorithm called lottery 
scheduling [23]. In lottery scheduling, the notion of tickets [22] is introduced. 
These are data structures that encapsulate resource access rights. Each ticket is 
expressed in a certain currency that is significant to the issuer and the holders. In 
the case of lottery scheduling, tickets represent a share of the available processing 
power. At each scheduling interval, tickets participate in a drawing. Each ticket’s 
probability of being selected is proportional to its share value. When a ticket is 
selected, its owner is scheduled for execution^. 

We implemented two kinds of second-level schedulers to provide best-effort 
and guaranteed-share service disciplines. They are both based on lottery schedul- 

^ One anonymous reviewer of non-technical background commented to this: “Remind 
me not to buy one of your lottery tickets!” 
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Fig. 4. A typical scheduling hierarchy 



ing but differ in the way they manage their tickets. The guaranteed-share sched- 
uler has to maintain a constant sum of ticket values that corresponds to a specific 
share of the processing power. It may issue tickets to its client threads, but the 
sum of the ticket values has to be conserved. In contrast to that, the best-effort 
scheduler might continue to issue new tickets without maintaining a constant 
sum. The new tickets cause inflation, the share value of each ticket drops, and so 
does performance of its owning thread. It is possible to implement price-adaptive 
scheduling under the best-effort scheduler by monitoring performance and buy- 
ing more tickets according to a utility function. A deadline-based scheduler can 
also be built under the top-level lottery scheduler: its task would be to acquire 
and release tickets according to its current demand for meeting deadlines. An 
example scheduler hierarchy in our system is shown in Figure 4. 

The ALIEN execution environment provides an interface to the built-in 
scheduler in the Thread module. For the lottery scheduler, we retained the same 
interface, with one new function added, set_ticket, which sets the tickets of 
the current thread to a specific type and value. Credentials may control how 
this function is called and with what arguments. However, it is the scheduler’s 
task to interpret the set_ticket call according to its own semantics. The call 
might simply cause inflation, trigger a call to acquire more resources from the 
parent scheduler or be rejected. We implemented the lottery scheduler so that 
it is customizable and reusable. Under appropriate circumstances, users may 
override default functionality and implement the ticket management functions 
themselves. 
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3.3 Service Brokers 

Service brokers provide the necessary credentials for enabling a service, and 
encapsulate service management functions. Conceptually, this function is equiv- 
alent to the trading function as described in [15, 13, 8, 9]. The Internet Diffserv 
architecture [2] also considers the Bandwidth Broker [21, 19] for managing net- 
work bandwidth. From the policy perspective, brokers are principals to which 
authority is delegated for managing a specific policy subspace. The implementa- 
tion issues here are how the broker that is responsible for a service is identified 
and how users can communicate with brokers. 

Since authority over some policy subspace is delegated to some broker, there 
exists information in the policy management system — in the form of creden- 
tials — that indicates this delegation. This information can be obtained by users 
and, through this service-principal-broker association, users can then establish 
communication with the broker. We implemented a broker communication mod- 
ule (BCM) whose task is to relay communication between brokers and users. 
The BCM allows users to find the appropriate broker (among those that have 
registered with the BCM) for the resources they need to consume, query those 
brokers on the availability and price of various services and acquire such services 
by providing the appropriate credentials. 



4 System Demonstration 

In this section we share some first experiences with our resource management 
system. First, we focus on the market-based CPU scheduler and demonstrate its 
ability to meet arbitrary user demands in various situations. We then describe 
the implementation of an active web proxy that exploits all different aspects of 
our architecture to their full extent. 



4.1 Evaluation of the Market-Based Scheduler 

In this experiment, a number of threads execute on the active node and consume 
CPU cycles. At each thread’s initialization, it is assigned a fixed amount of 
tickets for a specific scheduling class. Since we implemented two service classes, 
GSThread and BEThread, their schedulers hold tickets of the root scheduler: 
GSThread has allocated 500 and BEThread 300, which notes a 5:3 ratio of GPU 
resource shares between these two classes. The starting times, service class and 
ticket amount for each thread in our scenario and the consumption of CPU cycles 
over time for all threads are shown in Figure 5. 

We observe that the three guaranteed service threads are successfully isolated 
from the other threads that arrive over time. Also, after the best-effort class 
becomes empty, its share also becomes available. Note that thread 3 takes double 
the share that 1 and 2 have, exactly as defined by their ticket values. Note also 
that fairness is preserved in the best-effort class, in that all threads receive equal 
share of service since they have equal ticket values. 
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This simple experiment demonstrates thread isolation and differential service 
quality. However, no interactions take place between schedulers and threads, ex- 
cept for acquiring tickets at thread creation time. The next experiment features 
threads that dynamically acquire or release tickets according to a specific goal. 
Threads wish to maintain constant performance, while being clients of the best- 
effort service class. Periodically the threads evaluate their performance (process- 
ing rate) and readjust their ticket allocation to reflect their goal. Other threads 
that have no such requirements simply acquire tickets at creation time only. The 
characteristics of threads in this experiment and the threads’ CPU over time are 
shown in Figure 6. 

One can clearly see the ones that try to adapt to the environment. Figure 7 
shows the number of tickets for each thread. More elaborate strategies can be 
used that consider budget constraints and utility. It is also possible to allocate 
a GSThread ticket share as a secured minimum performance and BEThread 
tickets for further performance gains. Through these two simple experiments we 
give some insight into the flexibility, scalability, and applicability of our system, 
offered by a market-based approach to lower-level resource scheduling. In the 
next section, instead of being focused on a specific function of our system we 
utilize all the components we implemented. 



Tickets 




Fig. 7. Ticket allocation over time for the 4 threads that dynamically allocate 
CPU power 
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4.2 The Active Web Proxy 

To validate our system design and evaluate its impact, we implemented a web 
proxy server as an active extension that is loaded by users on active nodes. The 
following characteristics of this application are essential: 

— The user must be authorized to install the active extension, and the active 
extension should be able to access the modules and functions it needs. Such 
authority is granted to the user by some administrator, or by interaction with 
the market, for acquiring credentials. Users must then provide the necessary 
credentials for the web proxy to be able to act on their behalf, i.e., with 
their own resource access rights. 

— Network-level access control is needed to indicate which server addresses the 
proxy is allowed to connect to as well as which client addresses are allowed 
to connect to the proxy. In the active node’s local policy, the Tcp module 
is mapped to the Kntcp module (through a REPLACE credential), a wrapper 
module to Tcp. Kntcp is implemented so that every connection is subject to 
access checks. This could also be done using a CHECK-ARGS policy to check the 
arguments of the connect, accept or bind functions. Credentials supplied 
by the user authorize the proxy to “talk to” certain addresses. 

— CPU resources are consumed and should therefore be controlled through the 
market-based scheduler and the service broker function. Determining the ap- 
propriate brokers and credentials to use is implemented in the exception han- 
dlers that deal with failures during linking. These handlers contact the BCM 
module to determine the relevant available services and costs, and acquire 
the relevant credentials (which authorize the proxy to link with the selected 
thread-scheduling service) from the selected broker. The linking process is 
then restarted with the newly-acquired credentials until another exception 
is raised or linking is successfully completed. 

The active web proxy can then proceed to service requests. While this ex- 
periment proves our concept, its development provided a few interesting ob- 
servations. First, directly writing a large number of KeyNote credentials might 
not be the easiest way to specify policy. A higher-level language that can then 
be compiled into a number of KeyNote credentials could be more useful. Sec- 
ond, a potential performance bottleneck that could negatively affect scalability 
is the instant creation of credentials by the service brokers. On a 500 MHz Pen- 
tium III processor, signing a credential takes 8msec which corresponds to 124.36 
credentials/second^ . There are methods to overcome this limitation such as hav- 
ing pre-issued credentials, using lighter cryptographic algorithms or employing 
hardware support. 

5 Conclusions 

We have addressed the problem of scalable resource management in active net- 
working and , based on the scalability of market-based mechanisms, developed 



^ Verifying a credential is much faster, at 10.71/rsec, or 93457.94 credentials/second. 
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a novel system architecture for allocating and adjusting resource allocations 
in a system of communicating active network nodes. We have used a trust- 
management policy system, KeyNote, which allows us to ensure that resource 
allocations are controlled and enforced under specified policy constraints. We be- 
lieve that the resulting system is the first system which has provided a network- 
level resource management framework for active networking, moving beyond the 
node architecture considerations which have occupied much of the design efforts 
in first-generation active networking research. 

We believe that the system architecture described herein has considerable 
applications outside active networking. For example, it might serve as an equally 
powerful resource management paradigm in inter-networks where RSVP or other 
integrated services notions are used to control resources. While we have focused 
on active networking as our immediate concern, we intend to investigate the 
applicability of this system architecture to a wider set of distributed resource 
management problems. We believe that the scalability and security of this system 
are powerful attractions and that these fundamentals can be preserved across 
many changes of the environment. 
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Abstract. A novel approach to quality of service control in an active 
service network (application layer active network) is described. The 
approach makes use of a distributed genetic algorithm based on the 
unique methods that bacteria use to transfer and share genetic material. 
We have used this algorithm in the design of a robust adaptive control 
system for the active nodes in an active service network. The system 
has been simulated and results show that it can offer clear differentiation 
of active services. The algorithm places the right software, at the right 
place, in the right proportions; allows different time dependencies to be 
satisfied and simple payment related increases in performance. 



1 Introduction 

To be popular with customers an active service platform must provide some clear 
service quality assurances. Users of an active service network supply the programs 
and policies required for their custom services in transport packets alongside their 
data. Clearly it should be possible for these users to specify the Quality of Service 
(QoS) using any metric that is important to them. The rate of loss of packets carrying 
service requests or policies, and the service response time (latency) are two obvious 
examples. In this paper we discuss the management of QoS in an Application Layer 
Active Network (ALAN) [1] that enables users to place software (application layer 
services) on servers embedded in the network. Despite the obvious virtual networking 
overheads, the resulting end to end service performance will often be significantly 
better than if the services executed in the user's end systems (as at present). For ex- 
ample, a network based conference gateway can be located so as to minimise the la- 
tency of the paths used in the conference, whereas an end system based gateway will 
usually be in a sub-optimal location. 

For the purposes of this work we have assumed that the latency and loss associated 
with the network based servers is significantly greater than the latency and loss associ- 
ated with the underlying network. In the case of latency this is clear - packet handling 
times in broadband routers are around ten microseconds, whilst the time taken to 
move a packet into the user space for application layer processing is a few millisec- 
onds. In the case of loss the situation is less clear since currently servers do not drop 
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requests, they simply time-out. However, measurement of DNS lookup [2] suggest 
DNS time-outs occur significantly more frequently than DNS packet losses, so we feel 
our assumption is reasonable. 

In the next section we briefly describe our active services platform ALAN and its 
associated management system. We then justify our approach to QoS in an ALAN 
environment. We then describe a novel control algorithm, which can control QoS in 
the desired manner. Finally we show the results of some simulations using the novel 
algorithm. The results are very encouraging and illustrate for the first time that a 
distributed AI approach may be a productive QoS management tool in an active serv- 
ices network. However, further work is required before we can justify the use of our 
approach in a working active network. 



2 ALAN 

ALAN [1] is based on users supplying java based active code (proxylets) that runs 
on edge systems (dynamic proxy servers - DPS) provided by network operators. Mes- 
saging uses HTML/XML and is normally carried over HTTP. There are likely to be 
many DPSs at a physical network node. It is not the intention that the DPS is able to 
act as an active router. ALAN is primarily an active service architecture, and the 
discussion in this paper refers to the management of active programming of intermedi- 
ate servers. Figure 1 shows a schematic of a possible DPS management architecture. 




Proxylets 



Fig. 1. Schematic of proposed ALAN design 

The DPS has an autonomous control system that performs management functions 
delegated to it via policies (scripts and pointers embedded in XML containers). Cur- 
rently the control system supports a conventional management agent interface that can 
respond to high level instructions from system operators [3]. This interface is also 





360 Chris Roadknight and Ian W. Marshall 



open to use by users (who can use it to run programs/active services) by adding a 
policy pointing to the location of their program and providing an invocation trigger. 
Tj^ically the management policies for the program are included in an XML metafile 
associated with the code using an XML container, but users can also separately add 
management policies associated with their programs using HTTP post commands. In 
addition the agent can accept policies from other agents and export policies to other 
agents. This autonomous control system is intended to be adaptive. 

Not shown in the figure are some low level controls required to enforce sharing of 
resources between users, and minimise unwanted interactions between users. There is 
a set of kernel level routines [4] that enforce hard scheduling of the system resources 
used by a DPS and the associated virtual machine that supports user supplied code. In 
addition the DPS requires programs to offer payment tokens before they can run. In 
principle the tokens should be authenticated by a trusted third party. At present these 
low level management activities are carried out using a conventional hierarchical 
approach. We hope to address adaptive control of the o/s kernel supporting the DPS 
in future work. 



3 Network Level QoS 

Currently there is great interest in enabling the Internet to handle low latency traffic 
more reliably than at present. Many approaches, such as intserv [5], rely on enabling 
the network to support some type of connection orientation. This matches the proper- 
ties of older network applications, such as telephony, well. However it imposes an 
unacceptable overhead on data applications that generate short packet sequences. 
Given that traffic forecasts indicate that by the end of the next decade telephony will 
be approx 5% of total network traffic, and short data sequences will be around 50% of 
network traffic, it does not seem likely that connection orientation will deliver optimal 
results. 

A recent alternative has been to propose differentiated services [6], an approach 
that is based on using different forwarding rules for different classes of packet, and 
maintaining the properties of the best class by admission control at the ingress to the 
network. There are difficulties however. 

Admission control does not work well with short packet sequences [7] 

The proposed algorithms assume Poisson burst intervals when real traffic is in fact 
fractional Gaussian [8,9] and much harder to predict 

The performance benefits can only be obtained if the distribution of demand is 
such that only a small proportion of the traffic wishes to use the better classes [10] 
The proposed classes typically propose a low loss, low latency class that uses a 
disproprtionate proportion of the available network resources 

Despite the difficulties it is clear that differentiated services is currently the best 
available alternative. It therefore seems advisable to base any proposals for QoS 
management of active services on the diffserv approach. However, it also seems ad- 
visable to modify the approach and attempt to avoid some of the difficulties identified. 
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4 Adaptive Approach to Differentiated Active Services 

We propose a new approach to differentiating active services, controlled by an adap- 
tive control algorithm. Users can request low latency at the cost of high loss, moder- 
ate latency and loss, or high latency and low loss by adjusting the time to live (ttl) of 
the packets they send. Short ttl packets will experience high loss when the network is 
congested and long ttl packets will experience high delay when the network is con- 
gested. Users cannot request low loss and low delay together. This choice means that 
all the classes of service we support have approximately the same resource cost. As a 
result we do not have to consider complex admission control to ensure a favourable 
demand distribution, and we do not have to allocate significant resources to support a 
minority service. Two adaptations are possible if the performance is reduced by con- 
gestion; either the application sends less packets or the application persists until an 
application specific latency cut-off is reached and then terminates the session. Serv- 
ices such as telephony would use a low latency/high loss transport regime. This would 
require the application to be more loss tolerant than at present, however as mobile 
telephones demonstrate this is not hard to achieve. Interoperation with legacy tele- 
phones could be achieved by running loss tolerant algorithms (e.g. FEC) in the 
PSTN/IP gateway. We do not believe that users want an expensive low loss, low 
latency service. The current PSTN exemplifies this service and users are moving to 
VoIP as fast as they are able, despite lower quality, in order to benefit from reduced 
prices. 

Near optimal end to end performance across the network is obtained by enabling 
the servers to retain options in their application layer routing table for fast path, me- 
dium path and slow path (i.e. high loss medium loss and low loss). Packets are then 
quickly routed to a server whose performance matches their ttl. This avoids any need 
to perform flow control and force sequences of packets to follow the same route. 

For this approach to work well the properties of the servers must adapt to local load 
conditions. Fast servers have short queues and high drop probabilities, slow servers 
have long queues and low drop probabilities. If most of the traffic is low latency the 
servers should all have short buffers and if most of the demand is low loss the servers 
should have long buffers. Adaptation of the buffer length can be achieved using an 
adaptive control mechanism [11], and penalising servers whenever a packet in their 
queue expires. Use of adaptive control has the additional advantage that it makes no 
assumptions about traffic distributions, and will work well in a situation where the 
traffic has significant Long Range Dependency. This then resolves the final difficulty 
we noted with the current network level diffserv proposals. 



5 Adaptive Control 

Adaptive control [11] is based instead on learning and adaptation. The idea is to 
compensate for lack of knowledge by performing experiments on the system and 
storing the results (learning). Commonly the experimental strategy is some form of 
iterative search, since this is known to be an efficient exploration algorithm. Adapta- 
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tion is then based on selecting some actions that the system has learnt are useful using 
some selection strategy (such as a Bayesian estimator) and implementing the selected 
actions. Unlike in conventional control, it is often not necessary to assume the actions 
are reliably performed by all the target entities. This style of control has been pro- 
posed for a range of Internet applications including routing [12], security [13,14], and 
fault ticketing [15]. As far as we are aware the work presented here is the first appli- 
cation of distributed adaptive control to service configuration and management. 

Holland [16] has shown that Genetic Algorithms (GAs) offer a robust approach to 
evolving effective adaptive control solutions. More recently many authors [17,18,19] 
have demonstrated the effectiveness of distributed GAs using an unbounded gene pool 
and based on local action (as would be required in a multi-owner internetwork). 
However, many authors, starting with Ackley and Littman [20], have demonstrated 
that to obtain optimal solutions in an environment where significant changes are likely 
within a generation or two, the slow learning in GAs based on mutation and inheri- 
tance needs to be supplemented by an additional rapid learning mechanism. Har- 
vey [21] pointed out that gene interchange (as observed in bacteria [22,23]) could 
provide the rapid learning required. This was recently demonstrated by Fu- 
ruhashi [24] for a bounded, globally optimised GA. In previous work [25] we have 
demonstrated that a novel unbounded, distributed GA with “bacterial learning” is an 
effective adaptive control algorithm for the distribution of services in an active service 
provision system derived from the application layer active network (ALAN). In this 
paper we demonstrate for the first time that our adaptive control algorithm can deliver 
differentiated QoS in response to user supplied metrics. 



5.1 Algorithm Details 

Our proposed solution makes each DPS within the network responsible for its own 
behaviour. The active service network is modelled as a community of cellular auto- 
mata. Each automaton is a single DPS that can run several programs (proxylets) re- 
quested by users. Each proxylet is considered to represent an instance of an active 
service. Each member of the DPS community is selfishly optimising its own (local) 
state, but this 'selfishness' has been proven as a stable model for living organisms [26]. 
Partitioning a system into selfishly adapting sub-systems has been shown to be a vi- 
able approach for the solving of complex and non-linear problems [27]. 

In this paper we discuss results from an implementation that supports up to 10 ac- 
tive services. The control parameters given below are examples provided to illustrate 
our approach. Our current implementation has up to 1000 vertices connected on a 
rectangular grid (representing the network of transport links between the dynamic 
proxy servers). Each DPS has an amount of genetic material that codes for the rule set 
by which it lives. There is a set of rules that control the DPS behaviour. There is also a 
selection of genes representing active services. These define which services each node 
will handle and can be regarded as pointers to the actual programs supplied by users. 
The service genes also encode some simple conditionals that must be satisfied for the 
service to run. Currently each service gene takes the form {x,y,z} where: 
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X. is a character representing the type of service requested (A-J) 

y. is an integer between 0 and 200 which is interpreted as the value in a statement 
of the form "Accept request for service [Val(x)] if queue length < Val(y)". 

z. is an integer between 0 and 100 that is interpreted as the value in a statement of 

the form "Accept request for service [Val(x)] if busyness < Val(z)% " 

The system is initialised by populating a random selection of network vertices with 
DPSs (active nodes), and giving each DPS a random selection of the available service 
genes. Requests are then entered onto the system by injecting a random sequence of 
characters (representing service requests), at a mean rate that varies stochastically, at 
each vertex in the array. If the vertex is populated by a DPS, the items join a queue. 
If there is no DPS the requests are forwarded to a neighbouring vertex. The precise 
algorithm for this varies and is an active research area, however the results shown here 
are based on randomly selecting a direction in the network and forwarding along that 
direction till a DPS is located. This is clearly sub-optimal but is easy to implement. 
The traffic arriving at each DPS using this model shows some Long Range Depend- 
ency (LRD), but significantly less than real WWW traffic. Increasing the degree of 
LRD would be straightforward. However, the necessary change involves additional 
memory operations that slows down the simulation and makes the results harder to 
interpret. In any case inclusion of significant LRD would not change the qualitative 
form of the main results since the algorithm is not predictive and makes no assump- 
tions regarding the traffic pdf Each DPS evaluates the items that arrive in its input 
queue on a FIFO principle. If the request at the front of the queue matches an avail- 
able service gene, and the customer has included payment tokens equal to (or greater 
than) the cost for that service in the DPS control rules, the service will run. In the 
simulation the request is deleted and deemed to have been served, and the node is 
rewarded by a value equal to the specified cost of the service. If there is no match the 
request is forwarded and no reward is given. In this case the forwarding is informed 
by a state table maintained by the DPS using a node state algorithm. Packets with a 
short ttl are forwarded to a DPS with a short queue and packets with a long ttl are 
forwarded to a DPS with a long queue. Each DPS is assumed to have the same proc- 
essing power, and can handle the same request rate as all the others. In the simulation 
time is divided into epochs (to enable independent processing of several requests at 
each DPS before forwarding rejected requests). An epoch allows enough time for a 
DPS to execute 3-4 service requests, or decide to forward 30-40 (i.e. forwarding in- 
curs a small time penalty). An epoch contains 100 time units and is estimated to rep- 
resent O(100)ms. The busyness of each DPS is calculated by combining the busyness 
at the previous epoch with the busyness for the current epoch in a 0.8 to 0.2 ratio, and 
is related to the revenue provided for processing a service request. For example, if the 
node has processed three requests this epoch (25 points each) it would have 75 points 
for this epoch, if its previous cumulative busyness value was 65 then the new cumula- 
tive busyness value will be 67. This method dampens any sudden changes in behav- 
iour. A brief schematic of this is shown in figure 2. 
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Fig. 2. Future Network Model 

The DPS also has rules for reproduction, evolution, death and plasmid migration. It 
is possible to envisage each DPS as a bacterium and each request for a service as 
food. The revenue earned when a request is handled is then analagous with the energy 
released when food is digested. This analogy is consistent with the metabolic diver- 
sity of bacteria, capable of using various energy sources as food and metabolising 
these in an aerobic or anaerobic manner. 

Genetic diversity is created in at least 2 ways, mutation and plasmid migration. 
Mutation involves the random alteration of just one value in a single service gene, for 
example: 

"Accept request for service A if DPS < 80% busy" could mutate to to "Accept re- 
quest for service C if DPS < 80% busy" or alternatively could mutate to "Accept re- 
quest for service A if DPS < 60% busy". 

Plasmid migration involves genes from healthy individuals being shed or replicated 
into the environment and subsequently being absorbed into the genetic material of less 
healthy individuals. If plasmid migration doesn't help weak strains increase their 
fitness they eventually die. If a DPS acquires more than 4-6 service genes through 
interchange the newest genes are repressed (registered as dormant). This provides a 
long term memory for genes that have been successful, and enables the community to 
successfully adapt to cyclic variations in demand. Currently, values for queue length 
and cumulative busyness are used as the basis for interchange actions, and evaluation 
is performed every five epochs. Although the evaluation period is currently fixed 
there is no reason why it should not also be an adaptive variable. 
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If the queue length or busyness is above a threshold (both 50 in this example), a 
random section of the genome is copied into a 'rule pool' accessible to all DPSs. If a 
DPS continues to exceed the threshold for several evaluation periods, it replicates its 
entire genome into an adjacent network vertex where a DPS is not present. Healthy 
bacteria with a plentiful food supply thus reproduce by binary fission. Offspring pro- 
duced in this way are exact clones of their parent. 

If the busyness is below a different threshold (10), a service gene randomly selected 
from the rule pool is injected into the DPS's genome. If a DPS is 'idle' for several 
evaluation periods, its active genes are deleted, if dormant genes exist, these are 
brought into the active domain, if there are no dormant genes the node is switched off. 
This is analogous to death by nutrient deprivation. 

So if a node with the genome {a,40,50/c,10,5} has a busyness of >50 when ana- 
lysed, it will put a random rule (e.g. c,10,5) into the rule pool. If a node with the ge- 
nome {b,2,30/d,30,25} is later deemed to be idle it may import that rule and become 
{b,2,30/d,30,25/c,10,5}. 



6 Experiments 

The basic traffic model outlined above was adjusted to enable a range of ttls to be 
specified. The ttls used were 4, 7, 10, 15, 20, 25, 30, 40, 50, 100 (expressed in ep- 
ochs). Approximately the same number of requests were injected at each ttl. The 
DPS nodes were also given an extra gene coding for queue length, and penalised by 4 
time units whenever packets in the queue were found to have timed out. A DPS with a 
short queue will handle packets with a short ttl more efficiently since the ttl will not be 
exceeded in the queue and the DPS will not be penalised for dropping packets. Thus 
if local demand is predominantly for short ttl DPS nodes with short queues will repli- 
cate faster, and a colony of short queue nodes will develop. The converse is true if 
long ttl requests predominate. If traffic is mixed a mixed community will develop. In 
figure 3 the red dots represent DPS nodes with long queues, the blue dots represent 
intermediate queues and the green dots represent short queues. It is clear that the 
distribution of capability changes over time to reflect the distribution of demand, in 
the manner described above. 




Fig. 3. Distribution of DPS nodes with short medium and long queues at three different times 
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Figure 4 illustrates the dilTerentiated QoS delivered by the network of DPS nodes. 
The time taken to process each request is shown on the y access and tire elapsed sys- 
tem time is slrown on tlK x a.xis. It can be seen tliat the service requests v\ ith sliortcr 
times to live are being liandled faster than those with a longer time to live, as ex- 
pected. 
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Fig. 4. DilTerenl latencies for requests with dilTering times to live 

Figure 5 shows tire expected corrollary. More service requests witli short ttls are 
being dropped. Tltis is due to tlrcm timing out. and is tire essential down-side to speci- 
fying a sliort ttl. Although tire numbers of requests at each ttl value are rouglily equal, 
fewer short ttl requests arc liandled. 

In addition to v'arv ing tlie latency and loss associated with service requests users 
may also wish to vary the price tlicy are willing to pay. In the basic algorithm it was 
assumed tliat the network provider allocated a reward to each DPS for processing a 
service request. We investigated the impact of allowing tlic DPS to collect a greater 
reward. In the modified model tlie DPS is rewarded by tlie amount of tokens the user 
includes with tlie request. The traffic input was adjusted so that requests for different 
serv ices carried different amounts of pav nient tokens. Initially the DPS nodes were 
rewarded equally (25 ’tokens') for each of tlircc scrv ices A, B and C. After 500 ep- 
ochs the rate of reward is clianged so tliat each DPS is rewarded 4 times as much for 
processing service C (40 tokens) as it is for processing service A (10 tokens), w ith B 
staying at 25. This is equivalent to offering users a choice of tliree prices for a single 
scrv ice. Fig 6 shows the latency of serv ice requests for the 3 different sen ice types. 
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It is apparent tliat witliin 100 cpoclis tlK average latenc\ for providing ser\ ice C is 
redueed while the latency for A is increased. Fig 7 shows tliat requests for ser\ ice A 
arc also dropped (due to timing out) more tlian requests for service B and C. Before 
the cliange in reward the numbers of DPSs liandling each service were similar. After 
the reward rate change the plasmids for handling scrx iccs C and B have spread much 
more w idely around the network at the expense of tlK plasmid for the relatively ume- 
warding sen ice A. After 10(K) epoclis the rate of requests for all three sendees was 
returned to the original state. It can be seen, in both figures, that equality in quality of 
sendee, both in terms of loss rate and latency, quickly relumed. 




Fig. 7. Ffiects of dilTerent charging levels on dropping of requests 

Tliese last results indicate tliat tlie control method could potentially be used for a 
range of user specified parameters. We sec no reason w hy other parameters of interest 
could not be added to the model, and are very eiKouraged by the initial results. In 
particular we note that the latencies and loss rates arc comparable to those obtained in 
many conventional approaches to differentiated sendees, but mam’ of the difficulties 
conceming admission control liavc been avoided. 



7 Conclusions 

Our initial results show tliat the long-tcnn self-stabilising, adaptive nature of bacterial 
communities are well suited to the task of creating a stable community of autonomous 
active sendee nodes tliat can offer consistent end to end QoS across a network. Tlic 
methods used for adaptation and evolution enable probabilistic guarantees for metrics 
such as loss rate and latency similar to what can be acliicvcd using more conventional 
approaches to differentiated senices 
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Abstract. In server load balancing where replicated servers are dis- 
persed geographically and accesses from clients are distributed to repli- 
cated servers, a way of distributing the accesses from clients to an ade- 
quate server plays an important role from the viewpoint of load balanc- 
ing. In the paper, we propose a new network paradigm for server load 
balancing using active anycast. In active anycast, an end user only sends 
its request to group of servers using an anycast address. When this re- 
quest arrives at an active router, it selects an adequate server from the 
viewpoint of load balancing and changes the anycast address of a packet 
to the unicast address of the selected server. Thus, the decision which 
server is the best one from the viewpoint of server load balancing is 
made by an active router rather than an end user, so active anycast is 
a network-initiated method. Simulation results show that active anycast 
can accomplish efficient server load balancing, even when a small part of 
routers are equipped with active network technology. 

Keywords: Active Network, Anycast, Server Load Balancing, Web Server 



1 Introduction 

In the Internet, several types of services use replicated servers, which are geo- 
graphically dispersed across the whole network. One typical example is replicated 
server technology in the Web, e.g. mirroring and CgR[I]. The aim of this ap- 
proach is to prevent too many accesses from concentrating to a particular server, 
which causes degradation of the response time of a server itself and congestion 
in the network around that server. 

A lot of works how to distribute accesses from users to an adequate server, 
e.g. [3], [4], [5], have been published in a research area of distributed systems. 
These load-balancing methods constitute of a load information acquaintance 
method and a load distributing method. Almost all proposed schemes for load 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 371-384, 2000. 
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balancing assume that an end-user collects load information of the processing 
units and decides which processing unit to be selected from the viewpoint of load 
balancing. Thus, in these approaches, it is an end-user that makes the decision, 
so these are end-user initiated approaches. 

A new communication service, anycast, is now emerging as a network- 
initiated approach. In the usual point-to-point communication supported in the 
Internet, a destination host is identified by the Internet address. On the con- 
trary, an address used in anycast points to a group of servers offering the same 
services as the original server. A router which receives an IP datagram whose 
destination address field includes an anycast address forwards this datagram to 
an output link on the path to an adequate server. An end-user only knows an 
address which denotes a group of servers and sends an access to this group. 
It is a network that makes decision of which server to be selected, so this is a 
network-initiated approach. In all proposed anycast methods, the nearest server 
is selected at a router[6,7,8]. When the nearest server is selected, the load of 
each server is not taken into account in the decision of the server. Thus, it is 
impossible to select an adequate server from the viewpoint of load balancing. 

In the paper, we propose a new network paradigm, “active anycast” , which 
enables a network to select an adequate server from the viewpoint of load bal- 
ancing among candidate servers. Active anycast makes use of active network 
technology [9]- [14]. At a router equipped with active network technology, an in- 
dividual operation for each application or for each user can be accomplished. 
This means that an active router can operate additional intelligent processing 
as well as conventional routing in the network layer. In active anycast proposed 
in the paper, a new function of selecting an adequate server from the viewpoint 
of load balancing is injected to an active router. Thus, active anycast enables 
network-initiated server selection. 

The remainder of this paper is structured as follows. Section 2 presents an 
anycast service for the Internet and discusses the limitation of conventional any- 
cast. Section 3 describes an active network technology which enables intelligent 
operation to be equipped in a router. Section 4 presents our proposed “active 
anycast” and explains detailed operation of active anycast. Section 5 provides 
simulation results which investigate the efficiency of the proposed active anycast. 
Section 6 concludes this paper. 

2 Anycast 

Anycast technology is a new network-initiated service which facilitates an end- 
user finding the nearest objects geographically distributed over the whole net- 
work. An IP datagram whose destination address is an anycast address is for- 
warded to an output link on the path to the closest objects of the router. So, 
for an end-user, an anycast address seems like a unicast address which indicates 
the closest object. When this object is a server, the anycast technology can be 
used for selection of the closest server without an end-user’s knowing where it 
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is. Anycast technology will be fully supported in the IP version 6 (IPng). So, all 
IP version 6 routers can forward an anycast IP datagram to an adequate object. 

Even when anycast technology can be used, only the closest server can be 
selected in the present proposed anycast method category [6] [7]. When a par- 
ticular server is selected by too many clients as the closest server, degradation 
due to concentration of accesses from end-users cannot be avoided. Thus, im- 
provement of the response time from the viewpoint of load-balancing among 
replicated servers cannot be expected with the present anycast technology. 

3 Active Network 

In a conventional network, it takes a long time until users can be allowed to 
use a new technology because standardization process or replacement of net- 
work facilities such as routers equipped with new technology takes long time. 
To address these issues, the concept of “active network” emerged from discus- 
sions within the broad Defense Advanced Research Projects Agency research 
community [9]. In active network technology, networks are active in the sense 
that network nodes such as routers can perform customized computations on 
the packet flowing through them. 

In the conventional network, such as the present Internet, routers can only 
perform a network layer operation, e.g. mainly routing. On the contrary, routers 
equipped with active network technology, called active routers, can operate a so- 
phisticated process customized to each IP datagram or application(Fig.l). Thus, 
active network technology has the potential ability of enabling routers to operate 
some sophisticated control which can be done only by end systems in the con- 
ventional network. Many ideas of applying active network technology to several 
fields are proposed [2]. 

There are proposed two types of implementations of active network technol- 
ogy [2] [9], programmable switches and capsules. In programmable switches, users 
inject their custom process into the required routers and the customized process 
is operated to each IP datagram going through these active routers. In capsules, 
every message contains a program which is processed at an active router. 

A programmable switch approach is rather applicable for active anycast pro- 
posed in the paper. As described in detail in the next section, the method of 
server selection is injected to an active router, which enables adequate load bal- 
ancing among servers dispersed geographically. 



4 Active Anycast 

In this section, we describe the detailed operation of the proposed “active any- 
cast”. In active anycast, a router in the network autonomously distributes ac- 
cesses from clients adequately to geographically dispersed servers from the view- 
point of load balancing. The proposed scheme is based on anycast technology. 
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Active Node 




Fig. 1. Active Node in Active Network 



so we assume that all routers can support anycast technology^. However, we 
assume general assumption for active routers, i.e. not all routers are equipped 
with active network technology. 

In active anycast, a TCP connection which is initiated by the client is au- 
tonomously set up to an adequate server by an active router. When the client 
has a request to the server, it sends a name resolution query to the Domain 
Name Server (DNS) and gets a resolved anycast address (Step 1. in Fig. 2). This 
anycast address indicates a group of replicated servers (including an original 
server) which offer the same service. The initiating host sends a SYN packet 
whose destination address field indicates anycast address (Step 2). The SYN 
packet is forwarded to an output link on the path to the closest server when 
it arrives at a conventional anycast router (IP version 6 router) (Step 3). When 
the SYN packet with the anycast address arrives at an active router, it chooses 
an adequate server from all the candidate servers of the corresponding service 
from the viewpoint of load balancing. And an active router changes the desti- 
nation address of this SYN packet to the unicast address of the selected server 
(Step 4). Subsequently, the SYN packet is forwarded to the selected server as 
conventional unicast forwarding (Step 5). When the server receives this SYN 
packet, it replies an ACK-I-SYN packet (Step 7). And the client sends an ACK 
packet after it receives an ACK-fSYN packet, which means establishment of the 
TCP connection (Step 8). After that, the ordinary information exchange phase 
is started between the server and the initiating client (Step 9). 

In active anycast, a client and a server communicates with each other apply- 
ing conventional point-to-point communications. When conventional anycast is 
applied to server systems, subsequent datagram may be forwarded to another 
path than a SYN packet, i.e. the first packet. To overcome this technical problem, 
some methods are proposed[6]. In active anycast, TCP connections are estab- 
lished with the first packet, a SYN packet, and data is exchanged on this TCP 

^ IP multicast which is broadly used in the present Internet uses tunneling for hiding 
existence of non-multicast routers, “mrouted” makes virtual environment that all 
routers are multicast-router. It is also possible for anycast technology that makes 
virtual environment of all routers to be equipped with anycast technology. 
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connection. So, all packets corresponding to this TCP connection are forwarded 
on the same path and arrive at the same server. This means that there is no 
necessarity for active anycast to take into account of connection problem arising 
in conventional anycast. 

The way of collecting load information is also an important technical problem 
for active anycast to achieve a good performance of load balancing. There can be 
thought several ways for an active router getting load information from servers. 
The simplest way is that each server periodically multicasts load information to 
the multicast group whose members are active routers. In this case, the overhead 
for periodical load information seems to be a serious problem. As more efficient 
way, load information can be piggybacked with the data exchanged between a 
client and a server. When a server sends data to a clients, a server adds load 
information of itself in a piggyback style. As an active router along the path 
to the client receives this data, it collects load information with forwarding this 
datagram. With this method, the overhead for exchanging load information is 
significantly decreased. Several other ways can also be considered, but we would 
like to make this issue out of scope of this paper and leave it as further research. 
In the next section, we would like to show the potential performance of the 
proposed “active anycast” with the assumption that an active router has load 
information of all servers. 

For the implementation of active anycast, there are several technical problems 
to be resolved. Here, we discuss these implementation issues and show some 
solutions. At an active router, the IP destination address denoting anycast is 
exchanged to the IP address of the selected server. For the IP layer, the IP header 
checksum is re-calculated at every router because the contents of the IP header, 
e.g. the Time To Live (TTL) held, are changed at a router. For the TCP layer, its 
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header checksum is calculated at the end host by using a pseudo header including 
source and destination IP addresses. The TCP checksum is never changed inside 
the network in the conventional IP network. Active network technology allows 
routers to excecute higher layer processing including the transport layer. Thus, 
the TCP checksum can be re-calculated at an active router when the destination 
address is changed. 

Since an anycast address of a SYN packet is exchanged to an unicast address 
of a particular server at an active router, the server receiving this SYN packet 
sends a SYN-I-ACK packet whose source address indicates itself. When the client 
sending the SYN packet receives this SYN-I-ACK packet, the source address (the 
server’s address) is different from the anycast address. There can be the following 
two solutions for overcoming this problem. 

1. TCP/IP is modified so that a SYN+ACK packet can have a different source 
address from the IP destination address of a SYN packet only for active any- 
cast service. TCP/IP modification is only necessary for the client-side. For 
the server-side, The TCP/IP protocol works well as in the present style for 
active anycast. When the client-side TCP/IP receives a SYN-I-ACK packet 
from the server and it has already sent a SYN packet towards the anycast 
address, it receives this SYN-I-ACK packet and behaves as if it has already 
sent a SYN packet whose destination address is the corresponding server. 

2. In the above solution, TCP/IP protocol should be modified. Modification 
of TCP/IP needs replacement of the host communication program. So, it 
is better to avoid this modification. The second solution does not need any 
modification of TCP/IP but only needs modification of the application pro- 
gram. This scheme is based on a proposal [6], in which an anycast address is 
not modified in the network. This means that the limitation of conventional 
anycast, e.g. only the nearest server can be selected, still remains, which is 
different from our proposed scheme. 

When a client sends a SYN packet, it uses the IP record route option. As 
shown in Fig. 3(a), the client sends a SYN packet whose destination and 
source addresses are an anycast address and the client itself (X in the fig- 
ure), respectively. Routers on the path to the server are listed. At an active 
router (router B in Fig. 3(a)), the anycast address is exchanged to an address 
of an adequate server(Y in Fig. 3(a)). When a packet is sent from the server 
to the sender, i.e. backward direction, the IP source routing option is used. 
As shown in Fig. 3(b), the server sends a backward packet whose destination 
and source addresses are the client(X) and the server(Y), respectively. At an 
active router, the source address of a backward packet is exchanged to an 
anycast address. For backward transmission, the IP source routing option is 
used based on a source routing list obtained by the forward transmission, 
so a backward packet must visit an active router where anycast address 
is exchanged. An active router keeps a table of address exchanges for for- 
ward packets and this table is used for address recovery. With this method, 
TCP/IP at the server and the clients can work well. 
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In this method, the application program of the client-side should be modified 
to use record route option for the first packet (SYN packet). For the server- 
side application program, modification to use IP source routing option is 
necessary. These modifications of the application program are much more 
preferable than a modification of the TCP/IP protocol. 
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Fig. 3. Application Framework for Active Anycast 



5 Performance Evaluation 

In this section we evaluate the potential performance of the proposed active any- 
cast. Active anycast can be applied to any client-server model communications. 
In this section, the performance of active anycast in one typical client-server 
communication, the World Wide Web, is evaluated. 

5.1 Simulation Model 

Replacement of all routers in the Internet to active routers takes a seriously long 
time and the case of not all routers being active routers should be general. So, 
we evaluate active anycast with various ratios of active routers. 
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At first, basic performance of active anycast is evaluated with a simple net- 
work model, a random graph. Furthermore, the load balancing effect of active 
anycast is evaluated for the more sophisticated network model, the tier-based 
model[15] shown in Fig. 4 taking into account the hierarchical structure of a LAN, 
MAX and WAN in the Internet, and the performance effect of location of the 
active router is investigated. 




For both, the basic network model and the tier-based model, we make the 
following assumptions. 

— In the network, there are five replicated servers (including an original server) 
which have the same Web contents. 

— The server is modeled as M/M/1 queueing model with capacity of 1.0 con- 
tents/sec. 

When the random graph is used as network model, the following assumptions 
are made. 

— The network model of the random graph with 40 nodes is used. Each node 
corresponds to a router. 

— Five servers are located randomly in the network. 

— Accesses to servers are generated as Poisson process. The access arrival rate, 
a parameter of this simulation, indicates aggregate access arrival rate to each 
router from users connected directly to it. 

For the tier-based model, the following assumptions are assumed. 
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— Parameters for the tier-based model are as follows. 

• Number of WANs • ■ • 1 

• Number of MANs per WAN • • • 20 

• Number of LANs per MAN • • • 5 

• Number of Routers in the WAN • • • 20 

• Number of Routers in MAN ■ • ■ 5 

• Number of Routers in LAN • ■ • 1 

According to the above parameters, the total number of WAN routers, MAN 
routers and LAN routers are 20, 100 and 100, respectively. 

— At most one server is located in each MAN. A server is connected directly 
to a MAN router. 

— Accesses to servers are generated as Poisson process. Access arrival rate in 
the tier-based model is defined as aggregate access arrival rate to each LAN 
router and is 3.2 contents/sec. 



When an active router decides according to a deterministic server selection 
policy, it may fall into degradation of one particular server. This is because 
all active routers will forward their arrival request to a particular server. In 
order to prevent this degradation, a probabilistic policy of server selection has 
been proposed for distributed systems [3]. In the probabilistic policy, an adequate 
server is selected according to probability in a manner that a server with lighter 
load has larger probability to be selected. The probability of server selection is 
calculated taking account the server load. Several methods for calculation of the 
probability can be applied but in our simulation here we apply a following simple 
method. 

Each server counts its arrival accesses periodically and calculates arrival rate 
of accesses in each interval. In this paper we assume that an active router has load 
information of each server and each server has equivalent processing capacity. 
Each active router can calculate the target arrival rate to each server, Aq, as 
follows. 



A 



a 



ELiA. 



( 1 ) 



n 

where n is total number of server and A,j is access arrival rate to server s. Xa 
is calculated according to an assumption that a similar total request arrival in 
interval Ti_i will be observed in the next interval Ti(Fig.5). Thus, each active 
router autonomously distributes its arrival request to equalizes the arrival rate 
of each server. 

When Ps.Ti is defined as a probability of selecting a server s as an adequate 
server in interval Ti, it is calculated through Pg,Ti^i as follows. 



Ps,Ti = a ( 1 ^ ) Ps,Ti^i 



( 2 ) 



Whereby Ps,Ti satisfies X^s=i ^s,Ti = 1, so that 
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Fig. 5. Balancing Server Arrival Rate of Accesses 



When an active router distributes its arrival accesses to a server s according 
to the calculated probability in equation (2), the server utility will be balanced 
among all servers. 

5.2 Simulation Results 

Figures 6 and 7 show the variance of the server utilization and the average latency 
of obtaining contents vs. the ratio of the number of active routers, respectively. 
These results are obtained for the simple network model with a random graph. 
The case of a ratio of active router being 0% is the case where all routers are 
anycast routers (non-active anycast routers). Figure 6 shows that the variance of 
the server utilization decreases with an increase of ratio of active router, which 
indicates that requests going through an active router are distributed to a server 
from the viewpoint of load balancing of servers. When the variance of the server 
utilization is high, requests from clients have a tendency of concentrating to some 
servers, which leads to performance degradation of the latency for obtaining 
contents as shown in Fig. 7. From these results obtained for the random graph 
network model, a significant load balancing effect can be observed from the 
viewpoint of the latency for obtaining contents when only 20% of routers are 
equipped with active network technology. 

To investigate the dependency of the location of the active router in the total 
performance of active anycast, we use the tier-based model for evaluation. Figure 
8 shows the average latency characteristics of active anycast. In this figure, 
the ratio of active routers in the MAN and WAN is changed, independently. 
The x-axis and y-axis show the ratio of active routes in the MAN and WAN, 
respectively. The z-axis shows the average latency for obtaining web contents. 
When more than 20% routers in the MAN or WAN are equipped with active 
network technology, the average latency is significantly reduced. Actually, the 
average latency increases slightly with an increase of the ratio in MAN and WAN 
after it has the minimum value but this increase is too small to be observed in the 
figure. This is because an active router only takes into account load balancing of 
servers, so it may happen a server located far from the clients is selected. This 
leads to slight degradation of latency, but this degradation is trivial. 
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Ratio of active router (%) 

Fig. 6. Variance of Server Utilization Characteristics (Random Graph) 



Figure 9 shows similar characteristics as Fig. 8 but its x-axis and y-axis are 
the number of active router rather than the ratio of them. With a small number 
of active routers in the WAN, the latency can be improved significantly. The 
number of SYN packets going through the WAN routers should be larger than 
that through MAN routers because more IP datagrams are aggregated in the 
WAN routers than in the MAN routers. Thus, active routers in the WAN have 
more opportunities to change an anycast address of IP datagram to an unicast 
address of a server. From these results, it is more efficient to locate active routers 
in the WAN rather than the MAN from the viewpoint of load balancing for 
replicated servers in active anycast. 

6 Conclusions 

In the paper, we propose a new network-initiated server load balancing method, 
active anycast. In the Internet, replicated servers are dispersed in the network 
in order to distribute accesses from clients geographically. However, it is difficult 
to realize load balancing among servers because of the conventional client-server 
model clients that do not know about any information about the server load 
should initiate its access to the server. In active anycast, a router equipped 
with active network technology selects an adequate server from the viewpoint of 
server load balancing. A client sends an access to the server, a SYN packet, to an 
anycast address. This anycast address indicates a group of servers in the network 
which serve the same service. When a SYN packet sent from a client arrives at 
an active router, the destination address of the anycast address is changed to an 
unicast address of the selected server. Thus, a client has no necessity to select 
an adequate server and active router autonomously selects an adequate server 
from the viewpoint of load balancing, which means that this approach is purely 
network- initiated. 
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Fig. 9. Average Latency Characteristics (Tier-based Model: Number of Active 
Router is Parameter) 



Further research for active anycast includes the way how an active router 
gets load information from replicated servers and distribution of server access 
taking account not only server load but also network congestion. 
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Abstract. This paper presents a network service, which enables receive 
individual application demands and considers network condition 
information in order to achieve the best performance of network 
resources management and network information advisory service to the 
applications. The complications of the Internet, the increase of the users 
and the varieties of applications have made the condition of the network 
and the execution results unpredictable. All of these elements make the 
network service difficult to accomplish the real target. Here, we 
proposed the Demand Inquiring Service, a network service providing 
information and advisory services to the applications and at the same 
time managing the utility of network resources. The goals of Demand 
Inquiring Service are to function as automatic traffic engineering and to 
provide transparency about network condition to users and applications. 
Demand Inquiring Service provides services of storing, computing, 
forwarding the packets and giving higher user-friendly to users and 
applications. 



1 Introduction 

In general, from the first research and development of Internet, which is more than 25 
years ago, network is just another sharing resource that needs to be used and 
managed. From the application perspective, the network is just a “black box”. 
Application does not know what is happening inside the box and only knows the 
input-output of the box. As mentioned in [7], the beginning of the Internet was 
constructed with the concept of End-to-End system. The inner nodes such as routers, 
bridges have functions not more than storing and forwarding packets. 

With the growth of the Internet, the numbers of users and the kinds of applications 
have made the utilization of the sharing resources of the network become higher and 
complex. Moreover, some applications need services that can be best supported or 
enhanced using information that is only available inside the network [11]. In other 
words, if network timely use of the information is available to the application, such as 
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time and place of congestion, hot-spot points in the network, location of loss packets, 
etc, it will significantly enhance the service seen by the application. Conversely, if 
applications can give information that is needed by the network, it can optimize the 
service performance of the network. 

With the above reason, here we propose a service system that emerges the network 
information and application information into one service system, which we call 
Demand Inquiring Service (DIS). DIS has an aim to make users or applications freely 
request their demands into the network and then for the network to give services as 
users or applications need, adaptively and interactively. We do not just make the 
network to be intelligent as in Intelligent Network but using active network 
technology [2] [3] [4] [5] enables an environment for the users or applications to 
actively participate in when, what and how to do the requests. Furthermore, network 
can manage its resources to make good performance in its services. Our work tries to 
propose the mechanism and environment, which enables users to become active 
elements of the network, and the network becomes friendlier in providing services to 
users. Here network do not just store and forward the packets but store, compute, 
forward and give hospitality to the users/applications. 

The rest of the paper is structured as follows. In section 2, we describe definition of 
network services discussed in this paper. The next section, the concepts of the DIS are 
presented. Section 4 describes the structure, the parts of the DIS, the correlation 
between them and relation with other protocols. In section 5, we discuss about 
application of DIS and the simulation result. Finally, section 6 concludes this paper 
and presents the issues of further research. 



2 Network Services 

In this section, we first defined what we said with network services. Network service 
is activities, which network provide to the users or applications concerning with the 
data/packet communication between the users or applications. Sometime it is 
transparent and sometimes it is not from the point of view of users. From the point of 
view of the user, some services are transparent and others are not. Recently, the trends 
are not only how to provide good services, but also how to control the quality of 
services. 

Since there is much kind of users with variety of applications, in the future, 
network must become more adaptable to cater the wide range of user demands. There 
is a need to reconcile the perspectives of the telecommunication and computing 
communities in new dynamically programmable network architectures that support 
fast service creation and resource management through a combination of network 
aware applications and application aware networks [9]. The programmable networks 
enable the network to provide services to specific users as they themselves have 
arranged. 

From the point of view of the network, there are some service models and 
mechanisms, which have proposed by Internet Engineering Task Force (IETF). 
Notably among them, there are the integrated services/Resource Reservation Protocol 
(RSVP) model, the differentiated services (DS) model, multi protocol label switching 
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(MPLS), traffic engineering, and constraint-based routing (CBR) [13][14]. These 
methods or mechanisms provide services to make reservation or classify the services 
of packet transmission instead of providing best effort service. However, all of the 
services of these models or mechanisms are untouchable from the point of view of 
users. Here, we proposed the network service, which users could request their 
demands in the range of certain service level agreement. 



3 Demand Inquiring Service Concepts 

The basic concepts of Demand Inquiring Service are how to give the best services to 
the users or applications and simultaneously manage the utilization of the network 
resources. The former concept means the network trying to give the services as close 
as users or applications want to be. Network provides the services that concern the 
request of the users or applications. The second concept means network itself 
manages the resources utilization to achieve better performance without sacrificing 
the performance result that users or applications can get. 

Since network like the Internet, which consists of various kinds of devices, links 
and many different applications, network is in variant condition all the time. It is 
difficult for the applications to get stable services and also for the networks to give 
fixed services when traffic itself changes all the time. Therefore, it is necessary for the 
networks to perceive the condition all the time in order to provide optimum services 
to the user or applications, and at the same time make optimum network resources 
utilization. One solution for these problems are that network has to also provide time- 
variant services to the users or applications [6]. 

Although it is difficult to provide the services precisely the equivalent of the 
demand of the users or applications, here we propose the method to make interactive 
communication between users or applications and the network environment to 
achieve the better network performance and services as close as users or applications 
demand. 

DIS has an objective to provide the optimum time-variant network services all the 
time by using two kinds of parameters, the users or applications requests and the 
condition of network resources. We use active network technology to implement the 
inter-communication between users or applications and network. 

In the current system, network and applications run independently. Application 
requests the service, and the network, depending on what the network condition 
suggests; whether they can provide the requested service or not. In DIS, if the user or 
application sends the DIS request, network does not only store and forward the 
packets, but also actively checks the feasibility of the requested service to be done and 
provides time-variant services to get the optimal condition for the network itself or for 
the users or applications. 

In DIS, application has inter-communication with the nearest active node where 
DIS server runs. The active nodes exchange information mutually in order to perceive 
the condition of the network. Depending on the condition of the network and 
considering the best performance of network resources utilization, the DIS server 
computes the optimum services that can be provided and then it contacts the client 
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applications to give the information of network condition or advices information 
concerning the application request. After receiving the information, the client sends to 
the server the confirmation from the application. To get the confirmation from the 
application, we suggest the possibility of the question from the server and make a 
program to answer it, or we can interactively ask to the user. Based on the 
confirmation result from the client, DIS server does as follows: 

1 . canceling the request or 

2. continuing the request and doing the next processes. 

In the case of number 2, it can be done by DIS server itself or followed up by other 
service mechanisms such as, integrated services/Resource Reservation Protocol 
(RSVP), differentiated services, multi protocol label switching (MPLS), etc. 



4 The Structure of DIS 

This section presents the Demand Inquiring Service. First, the structure and the 
relative positions of the elements of Demand Inquiring Service system are described. 
The next sub sections present the DIS client and the DIS server consecutively. The 
correlation between DIS client and server, and the relationship between DIS and other 
services methods are presented in the two last sub sections. 



4.1 Network Model 

Demand Inquiring Service uses client-server model. There are two parts, client part 
and server part. In the client part, there are application and DIS client. And in the 
other part, there are DIS server that run on active node. Application links the DIS 
server via DIS client. 

User 




Network 



Figure 1. The Relative Positions of the Elements of DIS on the Network 

Demand Inquiring Service runs above other protocols such as RSVP, differentiated 
service, routing protocols and so on, and this relative position can be depicted in 
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internet network layer (see figure 1). It means that the demand inquiring service does 
not run separately in case providing services, but has correlation with other protocols 
to accomplish the application requests. 



4.2 DIS Client 

Application uses DIS client to connect to DIS server. DIS client is implemented as 
a library. DIS client library has a role as an interface between applications and DIS 
server. 

A DIS client establishes a connection to the nearest DIS server and conveys the 
request of application to the DIS server. To establish the connection to the DIS server, 
DIS client creates all packets, which are concerning with DIS and identified as the 
DIS packet. To identify the DIS packet, as an experiment, DIS client uses the last one 
bit of two-bit currently unused (CU) field of TOS octet field in the header of IPv4 
protocol (see figure 2). The ID is 1 for DIS packet and 0 for others. 
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Figure 2. Identifier of DIS packet in the header of IPv4 

When DIS client sends the request to the DIS server, as an ID of the request, a 
unique number, the combination of a port number, kind of transport protocol, IP 
address of sender and receive are used. DIS client also responds to the decision of 
request execution received from DIS server based on the instruction from the user or 
application. The form of the instruction is an interactive answer directly from the user 
or programmed instruction involved in the application. 

The request from application can be represented as a program or a sequence of 
service parameters. Here the request is classified as QoS, CoS (Class of Services), and 
CtS (Condition to serve). QoS includes parameters of bandwidth, delay, jitter, error 
rate, scheduling, throughput, etc. CoS is a definition of service classes, e.g. high, 
medium, low. CtS denotes parameters or program that represent the condition to serve 
the request. The example of simple CtS program is: 

if (delay <= 50) then 

grant access to the resources; 
else if (error rate < 10) then 

access to the resources in half level; 
else if (server down) then 
wait 10; 

else 



cancel request; 
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4.3 DIS Server 

DIS server runs on active node. Here, active node is the network node, such as 
router or gateway, which has the ability not only to store and forward the packet but 
also to do computing if necessary. This concept comes from active network 
technology. Figure 3 depicts location of DIS server in the network protocol. 

The DIS server within active node is a part for controlling the network services, 
which are provided to users/applications. This part receives a request from the 
application in the form of a source program or parameters. To perceive the condition 
of the network all the time, each active node exchanges information periodically 
among the adjacent nodes, besides collecting information of the condition in the node 
itself And from the condition information of these networks, DIS in active node 
decides whether the network would execute the request or not. 




Figure 3. The flow of packets in active node with DIS 

DIS server consists of four modules that are depicted in figme 4. The first is 
reception module, which receives the requests from the client, extracts the request ID 
and identifies them. The reception module then transfers the request into the second 
module, the request analyzer module. The reception module also has a role to receive 
the result of decision module and give information to the application. This module 
also manages the request, the status, and report to the proper correlation protocol, 
such as RIP, OSPF, RSVP, differentiated services, MPLS, etc. 

In the second module, request analyzer module, the requests are analyzed. First, 
the request analyzer module analyzes the form of the request whether it is a program 
or sequence of service parameters. Then if the form of request is a source program, 
the validity of the program is checked. If there is an error, send error messages to the 
client, which requested it. If the program is a sequence of service parameters, analyze 
the service parameters, which can be provided. 

The third module is decision module. Based on the analysis result and the 
information provided by monitoring module, the decision module carries out the 
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possibility test result of the request. It then compiles the information to gain the best 
possibility of execution. This information can be formed as a time rescheduling result 
or new parameters of services or protocol should be used or information of network 
condition. Then decision module composes the information to be provided to the 
client. 

The last module is the monitoring module. This module has two functions. First is 
to monitor the local condition, such as the congestion, packet error, and other 
parameters. The second is exchanging the information with adjacent nodes. In the 
case of exchanging information, the module gives the local information and receives 
the information about the condition of adjacent node. Besides that, the module also 
asks the possibility of the adjacent nodes sharing the resourees concerning the request 
of the user. Then the adjacent node also does the same process to their adjacent nodes. 
The monitoring module arranges all the information to be a bundle of information 
about the network condition, which needs to do the request of users or applications. 
Monitoring is done time by time, to gain the real information and provide better 
information service to the users or applications. Figure 4 shows the relation among 
these modules. 




Figure 4. Connection among modules of DIS server in active node 



4.4 Scenarios 

As described above, to make connection with DIS server, applications have to use 
DIS client. DIS client will send to the nearest DIS server and link the connection from 
application to the DIS server. Before sending DIS packet to DIS server, DIS client 
creates DIS packet with changing DIS flag to 1 . DIS client appends ID of application 
to DIS packet. In the active node, where DIS server exists, only DIS packet will be 
forwarded to DIS server and not for the others (see Figure 3.) 

After receiving the DIS packet, DIS server will record the ID of application and 
store for a stated period. In DIS server, if there is no access for the decided time from 
the same application with the same ID, DIS server discards this record. Then DIS 
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server will analyze the request that is contained in the packet. From the result of 
analyzing, decision module will contact the monitoring module, to consider the 
network condition in order to execute the request. Based on the information from the 
monitoring module and the execution of the request, the decision module will decide 
whether the requested services is possible to be done or not, as the quality of the 
requested parameter services. Except if the decision can be done in the requested 
services range, decision module will make schedule to be informed to application via 
DIS client. 

Concretely, concerning the request from the application, the network itself will 
consider the situation and then answer one of the possibilities, as described in list 1, 
below this. 

1 . Possible to immediately execute the request (Ack) 

2. Not possible to execute the request (Nack). 

3. At this time, not possible to execute the services in the requested level, but possible 
to execute in a level lower than the requested level. 

a. Executing in lower level and then execute in the requested level after certain 
time has passed (Ack&Pat&Time) 

b. Executing in a lower level and it is uncertain when it will be available to 
execute in the requested level. It will be informed when it becomes possible to 
be executed (Ack&Pat&lnq). 

4. At this time, not possible to execute the request, but possible to do one of the 
possibilities listed below, a or b: 

a. Possible to execute the request after certain time has passed 

(Ack&W ait&T ime) 

b. It is uncertain when it will be available to execute the request, is uncertain. It 
will be informed when it becomes possible to be executed (Ack&Wait&Inq). 

List 1. The example of the DIS server advise scenario 

After there is an agreement between DIS server and application to execute the 
request, DIS server will give information and request the correlation protocol to 
execute the request. There are some possibilities about scenario of request 
management such as, advising the best protocol to execute the request and so on. 



4.5 Relation with other Service Methods 

Many network service methods and mechanisms have been proposed by IETF. We 
stress here, DIS does not substitute those service methods or mechanisms, but DIS 
utilizes each function of the service methods to achieve the goal of DIS. In this 
subsection, we discuss DIS and its relation with integrated services, differentiated 
services, MPLS and constraint-base routing (CBR). 

The integrated services model is characterized by resource reservation. Before data 
are transmitted, the application must first set up paths and reserve resources. There are 
two service classes in integrated services, guaranteed service and controlled-load 
service besides best effort services. Since DIS needs to give certain information in 
some cases, such as the request from the application concerns with the dynamic 
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resource allocation. DIS can use the reservation methods of integrated services/RSVP 
as signaling protocol to request the resources or to get information about resources. 

In differentiated services, packets are marked differently to create several packet 
classes. Packets in different classes receive different services. Based on the DS fields, 
several differentiated services can be created. Basically, a client must have a service 
level agreement (SLA), with its Internet Service Provider (ISP), before being able to 
get differentiated services. There are static SLAs, which are negotiated on a regular 
statically, and dynamic SLAs, which use a signaling protocol to request on demand. 
In case DIS faces the classification of packets of flow, DIS can use the classification 
of SLA of differentiated services. Here, DIS can provide the condition of the traffic or 
network resources concerning with the request of users/applications and service 
classes. 

MPLS is an advanced forwarding scheme, which extends routing with respect to 
packet forwarding and path controlling. MPLS has a header that is encapsulated 
between the link layer and the network layer. It makes an MPLS-capable router 
examines only the label in the header. DIS can use the concept of services in MPLS 
[13], when DIS has to classify the traffics, forwarding the packets or use tunneling 
mechanisms. 

DIS also can use the CBR. CBR has goals to select routes that can meet certain 
QoS requirements and to increase utilization of the network resources. Since CBR 
considers network topology, requirements of the flow, resource availability of the 
links, and other specified policies, it can be used when DIS has to choose the route or 
DIS can also give request information of the users/applications to router with CBR. 

Finally, DIS does not work alone in order to provide the services to the 
users/applications, but makes cooperation with other existing methods or 
mechanisms, when necessary, in order to achieve better performance and better 
optimization of network resources utilization 



5 Discussion 

Demand Inquiring Service can be applied in various applications, adaptive quality 
of service management, flexible multicast routing for applications such as video 
conferencing, intelligent caching, load distribution for Web servers, information 
searching, file transfer, remote processing - operation, etc. To discuss about Demand 
Inquiring Services in this paper, we simulate the application that run on the network 
such as showed in figure 3 . 

Here, we assumed the case in the class of the Internet University. We simulated 
using OPNET6.0L simulation tool. 600 students are in different places in the network. 
They have to access 10 video files from three different video servers (see figure 3). 
Each file has a size of 100 Mbytes. Video servers are connected to different access 
points on the network. The network consists of various kinds of bandwidth lines. We 
simulated the application by using DIS and without DIS. Without using DIS means 
the access process is done with best-effort service. And using DIS means, the 
application that is used to access video server is modified to use DIS. 




394 Kustarto Widoyo et al. 



Server A 



Server B 



Server C 




First, before directly accessing to the target video servers, via DIS client the 
application contacts the nearest DIS server to ask the network condition and the 
access possibilities. Here, the DIS server will compute the request with considering 
the network resources utilization to achieve the optimal performance result. Here, DIS 
server compute the resources to provide the requested service. Since the content of the 
request is the same, DIS server just computes utilization of network resources and 
compares with the requests from each client. If all of the utilization of the requested 
services is over a certain value, then DIS server reschedules the execution of the 
requests and informs to the clients to postpone the execution. Here, by postpone the 
execution when the utilization of network resources is high, it conducts better 
performance of the network and the application. 

Here we showed the results of the simulations in figure 6, 7 and 8. In the 
simulation, we focused in the measurement of the utilization and the traffics in a 
router. From the graphs, we know that the flow of traffics in the router without DIS is 
very high when there are flowing requests hut the traffic was zero in the case of no 
requests (see figure 6). 

In the other side, when using DIS the traffics is always low. It is caused DIS server 
always manages the traffics, such as providing the information to achieve the best 
performance to the application when DIS server receives all the DIS packets from 
applications, considering the received information from other servers and 
rescheduling the execution time of all the requests. DIS server rescheduled the 
execution of applications in order to distribute the time of execution. Here, the 
application is simulated to follow what the DIS server advises. For example, if the 
DIS server advises to execute 5 minutes later, the application will execute the access 
into a video server 5 minutes later. We can see the result in figure 7, that the traffic is 
low and distributed in time -axis. 
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Figure 6. The traffics in the DIS router 




Figure 7. The traffics in the router (non DIS simulation) 
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Figure 8. The CPU utilization of the router with DIS and without DIS 

In figure 8, the CPU utilization in router when using DIS is compared with when it 
was not using DIS. The CPU utilization is low when using DIS, compared with when 
not using DIS. From this figure, we also know that using DIS, the utilization of the 
resources can be managed in order to achieve better performance of the packet flows. 
Here, we know that using DIS had effects, which was router rescheduling the requests 
in order to get best performance of the application and distribution of the router 
resources utilization. 

The other discussion is that the rescheduling or postponing the execution guided by 
DIS server affected the application directly or user indirectly. The rescheduling that 
considers the network condition and utilization also make the execution of application 
by user be rescheduled. From the point of view of user schedule, it means that when 
the network is crowded, user can do other things. With the information about network 
condition or advisement of execution from the network, users can schedule their jobs, 
and it means optimizing the utilization of user time. 



6 Conclusion 

We presented a new network service that we call Demand Inquiring Service (DIS). 
This service uses active network technology to make users/applications set their needs 
into the network, and networks know the needs in order to optimize the utilization of 
network resources besides providing the network information and advisory to the 
users/applications. With DIS, network do not only store and forward the packets, but 
network will have services: store, compute, forward and give hospitality. In some 
applications, a particular part of what DIS can support, can also to be supported by the 
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server side like in the end-to-end communication system. However, here, we want to 
stress the use of the application demands and network information in order to support 
the management of network resources and to improve the services to the 
users/applications. 

Finally, we would like to point out further issues left to be discussed. To increase 
the compatibility of DIS, we need to implement the correlation between DIS and 
other services mechanisms such as RSVP, differentiated services, MPLS and other 
protocols such as RIP, OSPF, etc. 
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Abstract. This paper investigates application of active networking tech- 
nology to enhance Internet Engineering Task Force (IETF) Mobile IP. 
We propose an active delivery scheme (ADS) that enhances the operation 
of Mobile IP. ADS contains the following functionalities: redirection of 
packets, fetst handoff, efficient delivery of binding updates and a frame- 
work for aggregation location update messages. Simulation results show 
significant improvements in transmission control protocol (TCP) perfor- 
mance with ADS. The performance of ADS is compared with those of 
original and optimised schemes. The improvements due to ADS are most 
significant when both communicating ends are mobile and during high 
handoff latencies. 



1 Introduction 

Routing is a fundamental problem in wireless networks. The proliferation of wire- 
less computing within modern networking environments have seen many prob- 
lems emerge, in particular mobile host (MH) routing. The IETF is currently 
standardizing Mobile IP based on the specifications in [I]. Mobile IP specifically 
addresses routing between MHs on top of the existing IP protocol. Despite the 
fact that significant progress has been made, many problems still exist. One of 
the common problems in Mobile IP is the triangle routing problem. All packets 
destined for the MH need to be intercepted by the home agent (HA) so they 
can be forwarded to the MH. Optimized Mobile IP [2] solves the triangle rout- 
ing problem by sending a binding update to a corresponding host (CH) thus 
reducing latency. Besides triangle routing, we observe the following short com- 
ings with existing mobile IP models [1][2]: registration latency, binding update 
latency inefficient handling of in-transit packets, inefficient binding update de- 
livery and lack if location update message aggregation, These short comings will 
be elaborated in detail later in the paper. 

The main theme of this paper is to investigate the use of ANs for mobile IP 
and carry out performance evaluation studies. Simulation studies show that the 
drawbacks of existing mobile IP models [I] [2] can be overcome by employing ANs. 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 398-415, 2000. 

(c) Springer-Verlag Berlin Heidelberg 2000 
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In this paper, the objectives are to show that: (1) intra-network processing helps 
TCP recover faster after handoff and (2) Mobile IP can be enhanced significantly 
with the deployment of our ADS. These objectives are realised using two novel 
functionalities: aggregation of location update messages using the BusStation 
framework, and a multicast like delivery of binding updates. 

2 Overview 

In our approach we assume a network interconnected by ARs. These ARs are 
loaded with the ADS program as described in Section 2.1. The main functions of 
the ADS are: (1) Reroute in-transit packets during handoff (2) Handoff quickly 
(3) Send binding updates to CH and (4) Aggregation of location update messages 
using the BusStation framework 




Cell Boundary 

Fig. 1. This figure shows the setup of ARs before and after handoff. ARs R5 
and R6 are programmed by the MH 



Figure 1 gives an overview of how ADS enhances the implementation of 
Mobile IP. In Figure 1, R1 to R4 are programmed. The MH is moving from 
F Ai to F A 2 - When the MH has migrated to F A 2 it sends an active registration 
request to inform its HA. As the active registration request traverses through 
ARs, the ADS program is loaded onto such routers as R5 and R6 which are 
not programmed a priori. At R3, no reprogramming is performed but the ADS 
program within R3 notes the MH’s new care-of-address and forwards the active 
registration request. At this point in time any packets destined for the MH will 
be retunnelled to the MH’s new location by R3. In addition, a binding update 
and registration reply message is sent to the CH and MH respectively. Here we 
assume that security associations exist between ARs, CHs, HA and the MH. 
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Thus enabling the CH to tunnel packets directly to the MH’s new location and 
reducing handoff latency. Besides that new CHs wishing to communicate with 
the MH have a faster connection setup time since ARs can redirect packets to the 
MH’s current care-of-address without going through the HA. The ADS program 
at each AR keeps track of messages going to the HA and the number of hop 
counts to each CH. This enables efficient delivery of binding updates. 

Upon setup, each ADS creates a data structure which stores the mapping 
between the MH’s home address and its care-of-address. Also stored are a list of 
CHs communicating with the MH and the MH’s previous FA if any. This infor- 
mation allows the AR to tunnel packets to the MH and send binding updates. 
The data maintained by each ADS for a given MH is referred to as a state and is 
shown in Table 2. The corresponding states maintained for each CH are shown 
in Table 2. The ADS program is maintained using soft-state, a timer is used to 
determine its time of service. The timer is refreshed each time the ADS processes 
a packet. 



Table 1. State: Data Maintained by AR for a given MH 



Data 


Size (Bytes) 


AR Identification 


8 


MH’s IP Address 


4 


MH’s Care of Address 


4 


MH’s HA address 


4 


CH addresses 


N 


Expiration Time 


4 


Security Association 


8 



Table 2. CH data maintained by AR 



Data 


Description 


Size (Bytes) 


Address 


IP address of CH. 


4 


Update Status 


This flag indicates whether a binding update has 






been sent. 


1 


Hops 


The number of hops to the CH. 


2 


Latency 


Observed packet latency. 


4 


Responsible AR 


IP address ot the AR responsible tor sending 






binding request. 


4 



The main assumption taken here is that after migration, ARs along the route 
are able to intercept any packets destined to the MH. If for example the CH has 
a direct link to the HA that bypasses ARs then ADS poses no advantage. In such 
cases the CH will receive its corresponding binding update when the registration 
request arrives at the AR local to the HA and therefore the binding update 
latency is reduced. We also assume that the paths are symmetric: the paths 
taken by registration request and reply are the same. This assumption simplifies 
the update of ADSs when the registration reply passes through. Although in 
asymmetric network we could impose a hop-by-hop forwarding mechanism in 
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which each AR knows its upstream and downstream ARs. This complicates the 
operation of ADS because an additional protocol is required to keep track of the 
programmed ARs between the HA and MH on a hop- by-hop basis. 

2.1 Distribution of the ADS Program 

The ADS programs are distributed to ARs using the following method. The 
ADS program is embedded in the registration reply message. This means the 
HA is responsible for distributing the program itself which enable simple code 
management and also allow different domains to deploy different AN-based so- 
lutions. The adaptation to mobility (i.e., time for ADS to become operational) 
is dependent on the time it takes for the registration request to reach the HA 
and the time for the HA to program ARs. To enable a faster adaptation, the 
MH stores the program it receives from the HA during registration reply. Since 
we are taking a programmable approach, an active tag is used, thereby allowing 
the corresponding ADS program to be loaded as discussed in the previous sec- 
tion. As a result when a MH initialy migrates out from its domain, ARs are not 
programmed, thus there is no performance enhancement. During subsequent reg- 
istration requests, possibly when migrating to another domain the MH embeds 
the program in its registration request. At this point, since some routers have 
already been programmed, the ADS takes action to improve the performance of 
Mobile IP. 

3 ADS: Protocols 

3.1 Hierarchical Handoff 

Previous work [3] has shown that MH migration is usually local (within a subnet, 
or a given domain). As part of the ADS handoff process, a registration reply is 
sent to the MH when a registration request is processed. Hence handoff latency is 
reduced since the MH does not rely on its HA. The ADS works both locally and 
globally; the ARs are dynamically set up promoting scalability. This is because 
as the MH migrates to a new domain, the local router becomes programmed and 
handles all registration requests within the given domain. This means a given 
domain is not required to deploy a dedicated protocol to ensure low-latency 
handoff. 

When an ADS receives an active registration request, it sends a registration 
reply back to the MH conhrniing the handoff process. As a result, the handoff 
latency is not dependent on the delay between the MH’s new location and its HA 
but is dependent on the delay between itself and the closest programmed AR (the 
first AR encountered by the active registration request which contain an ADS 
program for the given MH). Refering to Figure 1, R3 generates a registration 
reply to the MH once it has processed the active registration request. 

In ADSs, all active registration requests are relayed to the MH’s HA. Alter- 
natively, we can have the ADS program relay the request only if the MH has 
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moved across subnets. As a result the number of active registration requests di- 
rected towards the HA is reduced. The benefit of this feature is crucial in a pico 
cell environment where handoff rate is high. Due to the high mobility rate, the 
HA experiences a higher number of registration requests from the MH, hence 
increasing bandwidth used and HA load 

3.2 Packet Redirection 

When a MH migrates, packets are redirected at each ADS during the migra- 
tion itself. Simultaneously, the CHs are updated with the MH’s new care-of- 
address. During this interval, ADS prevents packets from being forwarded to the 
wrong address. When the ADS receives an ordinary packet it checks whether the 
packet is going towards the correct destination. If the packet is headed towards 
the wrong address then the packet is redirected. A check is done to determine 
whether the packet is encapsulated. If so, the packet is decapsulated and re- 
encapsulated with the MH’s care-of-address and to the MH. 

The forwarding process can be augmented with customised functionalities. 
For example when the MH has a high handoff rate or the MH is unreachable 
(due to non-overlapping cell areas) the ADS can be programmed to delay the 
delivery of messages until the MH becomes available. In this paper, no special 
policies are implemented. 



3.3 Efficient Delivery of Binding Updates 

In this section, an efficient binding update delivery scheme is presented. The 
idea here is similar to having a multicast like approach where the CHs are the 
receivers and the HA is the sender. Unlike multicast, the CHs are neither required 
to join a multicast group nor a multicast tree built. In ADS, each ADS program 
keeps a hop count to a given CH. During processing of the active registration 
request message, the ADS that is closest to a given CH generates a binding 
update message. As a result the CH is notified earlier rather than waiting for 
the HA to notify the CH of the new care-of-address. Moreover, the ADS reduces 
load on the HA since the HA is not required to generate binding updates to the 
CHs and also the bandwidth consumed is reduced. 



Hop Count Firstly, an explanation of how hops to each CH are measured by 
each AR. The number of hops observed is used by ADSs to determine whether 
they should send binding updates to a CH. Each router records the number 
of hops required to reach a given CH that is communicating with a MH. To 
calculate the number of hops, an Active Discovery Capsule (ADC) is used. ADC 
contains three pieces of information, hop count and security key^ and Resp-AR. 
The security key is basically a security association between CH and routers, and 

^ Here we are assuming that there are no mobile switching centre and whenever the 
MH crosses a cell boundary an active registration request is generated. 

^ e.g., public/private key pair. 
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serves to authenticate all binding update messages and most important of all, 
to re-register after migration. The Resp.AR field contains the address of the AR 
which is responsible for sending a binding update during handoff. The details 
of which ARs take responsibility will be elaborated further in the next section. 
The capsule is sent by a CH wishing to receive binding updates sooner. During 
connection setup, the ADC can be piggybacked with setup messages, for example 
the synchronization message during the setup phase in TCP. The destination 
address of the packet containing the ADC capsule is the MH’s current location. 
At each AR downstream from the CH, the code within the ADC increments 
the hop count value. Once an AR that contains a state for the given MH, the 
AR records the hop count value from the ADC, increments the hop count value 
before forwarding it towards the MH. Therefore each router has a fair idea of 
the distance to the CH. 



Which AR Sends the Binding Updates? In order to determine which ARs 
along the path from the HA to the MH are responsible for sending binding 
updates during handoff, the following algorithm is used: 

Upon receipt of an ADC capsule, the AR checks whether it is the first to 
intercept the capsule by accessing the Res-AR field within the ADC. Note that 
the AR processing the ADC capsule must also be serving the corresponding MH. 

/* Get the CH’s address from the capsule */ 

CH_addr = Get CH address from the capsule 

/* If we have record, then hop count > 0. Otherwise return */ 

/* 9999 to denote no route */ 

CH_hop - Get_hop_count_to_CH(CH_addr) 

/* Get the AR address which is responsible for the given CH */ 

R_AR_addr = Get Resp_AR field from capsule 

/* Current hop count recorded */ 

Recorded_Hop = ADC . hop_count 
IF R_AR_ADDR is empty THEN { 

/* This meains we take the responsibility */ 

Create a record for the CH (See Table 4.2) 

Set Responsible AR variable to local IP address 
Set I_SENT flag to local IP address 

} 

ELSE IF R_AR_ADDR == local IP address THEN { 

IF CH_hop == Recorded_Hop THEN 
ADC_Discard = TRUE 

} 

ELSE { 

/* Check whether we have a better route to CH_addr ♦/ 

IF Recorded_Hop < CH.hop THEN { 

Send an update to the address stored in the Responsible AR variable. 

Set Responsible AR variable to our IP address. 

> 

} 

IF NOT ADC.Discard THEN { 

Increase_hop_count (ADC .hop_count) ; 

Forward_Packet (ADC) 

> ELSE 

Remove_packet (ADC) 



Algorithm 3.1: Determining hop count to each CH 

Note that in Algorithm 3.1 we can simply assume that the responsible AR is 
the router that first intercepted the ADC and the ADC can be discarded after 
processing. Although the bandwidth and processing consumed can be conserved 
since the ADC does not need to travel all the way to the MH’s care-of-address, 
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but the hop count information will be outdated once the MH or CHs migrate. 
Apart from that Algorithm 4.1 can be augmented to include other parameters 
such as load/trafhc. For example if the AR that intercepts the ADC has a high 
load, then it can choose to be not responsible for the CH. Hence the down- 
stream AR will handle the capsule. To determine whether a shorter route exists, 
the AR searches its list of CHs (maintain for each MH) for the CH’s address. 
If a record exists and contains a smaller hop count than the advertised hop 
count, the current AR takes responsibility. Once a new AR takes responsibility, 
the previously responsible AR is notified by the message CHANGE_AR. The 
message CHANGE_AR contains the address of the AR that is taking over the 
responsibility for a given CH. When an AR receives a CHANGE_AR, the AR 
resets the variable Responsible AR. This also includes intermediate routers that 
intercepted the CHANGE.AR message. 



Example 

To illustrate binding update delivery scheme further, consider Figure 2. For this 
example, assume that the MH and CHi have not migrated to the BSs shown in 
Figure 2. Also, except for CHi, all CHs reside at the corresponding FAs shown 
in Figure 2. 




Fig. 2. Example topology, a MH communicating with four fixed hosts 



In Table 2 we see that upstream ARs have shorter hop(s) to a given CH 
(given the MH’s current location). For example the number of hop counts from 
AR 3 to CHi is two whereas that from AR 2 to CHi is one. When a registration 
request is received, an AR checks its list of CHs and generates binding updates 
only to those CHs it is responsible for. If an AR observed an alternative route 
to a given CH a control capsule is then sent to the responsible AR (possibly 
an upstream AR). This scheme is passive in that ARs do not actively exchange 
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Table 3. Hops information for CHs maintained by each AR 



Router 


CHi 


CH 2 


CH 3 


CHi 


CH 3 


1 


None 


None 


None 


None 


1 


2 


1 


None 


None 


None 


2 


3 


2 


None 


None 


None 


3 


4 


3 


None 


None 


1 


4 


5 


3 


1 


None 


2 


5 


6 


4 


2 


1 


2 


6 



control messages and do not use an elaborate discovery mechanism. Hence low 
overheads are observed. In the event of ARs observing the same number of hops, 
the upstream AR is used. 



MH/CH Migration When the MH/CH migrates, the router previously re- 
sponsible for a CH may no longer be valid. Therefore, an update mechanism is 
required to determine whether the responsibility roles have changed. In ADS, 
the update process is initiated during the handoff process. In other words, when 
registration request is received or upon receipt of a binding update sent to the 
previous BS. When any of these messages are received, a binding update is sent 
towards the CH in which it has responsibility. In addition, each CH is informed 
to generate an ADC towards the MH’s new location. Also, in the ADC, the local 
IP address is included, which is echoed in the ADC generated by the CH. When 
a router serving the MH intercepts the new ADC, the ADC is authenticated. 
Algorithm 4.1 is then executed to determine whether the roles have changed. 
If the Resp AR variable in the ADC is similar to that of the local IP address, 
then no roles have changed. Hence the ADC is discarded. On the other hand, 
if the Resp AR variable does not match the local IP address, and the recorded 
hop count is smaller, then an invalidate message is sent to the router recorded 
in the Responsible AR. Here the Responsible AR could be empty when the MH 
migrates upstream, for example in Table 2, ARi does not have an entry for CHi 
to CH 4 . 

In Figure 2 the MH migrates to FAi. As a result, registration request will 
be sent to the HA and also a binding update generated to FA 3 . Referring to 
Table 2, before handoff AR^ has a one hop count to CH 2 - After migration the 
responsible ARs for Ci ?2 and CFf^ have to change. Note that upon handoff, the 
corresponding states for the given MH at R 5 and Rq are unloaded. Therefore 
the responsible AR after handoff is R4. After receiving new ADCs, the hops 
information shown previously in Table 2 is updated to that of Table 2. 

3.4 The BusStation Framework 

A novel feature here is the aggregation of binding updates and registration 
requests through the use an adaptive BusStation framework. The idea of the 
BusStation framework is similar to a bus in real life, thus the name BusStation. 
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Table 4. Hops Information for CHs maintained by each AR after MH migration 



Router 


CHi 


CH 2 


CH 3 


CHi 


CHs. 


1 


None 


None 


None 


None 


1 


2 


1 


None 


None 


None 


2 


3 


2 


None 


None 


None 


3 


4 


3 


2 


3 


1 


4 


5 


None 


None 


None 


None 


None 


6 


None 


None 


None 


None 


None 



Note that the term ’bus’ is due to the similarity of the operation of buses in real 
life and it does not refer to the definition of bus in the computer communica- 
tions context. The bus in ADS is represented as a capsule containing multiple 
location update messages. Each AR acts as a BusStation where buses wait for a 
particular time. While the bus is waiting at an AR, it takes on passengers (i.e, 
location update messages) and it moves on to the next station downstream when 
the waiting time expires. The details of each aspect of the BusStation framework 
will be presented in the next section. 

To demonstrate the BusStation framework, its applications to the aggrega- 
tion of location update messages will be demonstrated. The rationale for aggre- 
gating location update messages is based on the likely possibility that in the near 
future as MHs become ubiquitous, Internet’s traffic would constitute a sizeable 
number of update messages. The objective here is to reduce the number of pack- 
ets forwarded to HA or CHs. The direct consequence of this is the reduction of 
load at routers and conservation of bandwidth. Badrinath et al. [4] have shown 
the benefits of aggregating small packets. This observation can be applied to 
registration requests as well since each registration request size is only 28 bytes. 
As registration requests are handled by ADS, MHs do not depend on their HAs 
for fast handoff and CHs need not wait for HAs to send them binding updates. 
In other words, a delay in location update messages does not reduce the per- 
formance of Mobile IP given that ADS is deployed. The aggregation of binding 
updates is more suitable for CHs with a high number of MHs, for example a web 
server. A high mobility rate results in the flooding of binding updates towards 
the CH which severely increases its load. However, this scheme might not be 
useful when the number of location update messages are low. 

For brevity, this section will only describe the procedure for aggregating 
registration requests. Note that the same procedure applies to binding updates. 
When the BusStation is enabled, the ADS invokes a program called ABus. The 
ABus program described in following the paragraph basically tracks all location 
update messages going to the same HA. Upon arrival, a location update message 
is processed by the ADS and passed to the ABus. The ABus then determines 
whether there is a bus waiting to take packets to the given HA. If a bus exist then 
the packet is ’loaded’ onto the bus. Otherwise a new bus is created and the packet 
is ’loaded’. A bus is associated with a waiting timer. This determines how long 
a given bus waits for location update messages. Once the waiting timer expires. 
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the bus takes off and it moves to the next hop to pick up packets. When the bus 
reaches the AR local to the HA, the bus unloads its ’passengers’. Alternatively 
if the HA supports ABus then the bus is unloaded at the HA. As a result the 
number of packets traversing towards the HA is reduced. 



The Bus The bus created by the ABus program is shown in Table 3.4. Basically 
the bus encapsulates a sequence of packets. The header information is used by 
the ABus program to manage buses. Once the bus is created ABus allocates 
a slot for it. Assuming a bus size of 576 bytes, a maximum of 19 registration 
requests (passengers) with 28 bytes per registration request is possible. If this 
limit is exceeded a new bus is created and the full bus’s status is updated to 
express, which means it does not stop at any node. The express bus is then 
forwarded to its destination. Note that each bus has a maximum waiting time 
and age parameters. These parameters define the thresholds in which the bus 
waits at a particular AR and the upper limit on the packet latency. 

The algorithm executed at the ABus when an location update message or a 
Bus arrives is shown below; 

wt = Calculate_Waiting_Time_For (HA address) 

IF LQCATION_UPDATE_MESSAGE THEN { 

Best = Get_Destination_From_Packet (HA address) 

IF Exist_Bus_Going_To(Dest) THEN { 

Load_Packet_Into_Bus (Bus.No , LOCATIDN_UPDATE_MESSAGE) 

} ELSE 

Create_New_Bus_To (Best , wt) 

Load_Packet_onto_Bus (Bus_No , L0CATI0N_UPBATE_MESSAGE) 

} 

> 

IF BUS and NOT express mode THEN { 

Best = Get_Bestination_From_Bus_Header (Bus) 

MaxWait = Get_Max_Wait_Time(Bus) 

IF NOT Exist_Bus_Going_To(Best) ANB wt < MaxWait THEN { 

Install_Bus_At_Slot_No(Best, wt) 

} ELSE forward bus to next hop 

} 

Algorithm 4.2: Pseudocode for ABus Program 



Table 5. ABus packet information 



Field 


Size (Bytes) 


Description 


Destination Address 


4 


Uses by ABus program to determine where 
bus is headed. 


Age 


2 


How long should the bus stays in the network. 


Size 


1 


The bus’s size. 


Status 


1 


1 means express 0 otherwise. 


Max Wait Time 


2 


The maximum time it waits at a given node. 


Scat Numbers 


4 


'4'he number of packets in the bus. 


Passengers 


TV X 28 


The packets loaded onto the bus. 28 here 
means the size of of registration request 



Bus Management As mentioned each bus is allocated a slot by the ABus 
program. This is implemented as a hash table where the destination address is 
the key. When a bus arrives ABus checks to see whether a bus exists, if not then 
a bus is inserted in its slot. A waiting timer is assigned to the bus. When the 
timer expires the bus is forwarded to the next hop. 
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Waiting Time The waiting time affects registration request’s latency and the 
number of pickups by a given bus. If the waiting time is too long then the 
bus might overflow and a new bus is created. Thus increasing the number of 
packets heading towards the HA. Furthermore, the registration request’s lifetime 
might expire before the bus reaches its destination. The objective here is to find 
a tradeoff between the reduction of packets directed towards the HA and the 
waiting time. 

The waiting time is calculated based on the following policy. The waiting 
time for a given bus is dependent on the rate of arrival of registration requests. 
In other words each AR has to monitor the frequency of registration requests 
heading toward a particular destination. It then calculates the rate of registration 
requests for each HA. This rate is then used to calculate the waiting time for 
a given bus. If the rate is high the bus’s waiting time is longer and vice-versa. 
A different view can also be undertaken whereby given a high rate the waiting 
time is reduced given that the rate in which the bus is filled is high, therefore 
less time is needed. In ABus, the first approach is taken because it was found 
that the first approach results in higher number of aggregated packets. 

The calculation of the waiting time is similar to that of TCP’s calculation 
of RTT [.5]. The difference here is that a waiting time that maximizes packet 
reductions given different arrival rates is considered. The formula for waiting 
time and inter-arrival time are as follows: 

Update Inter-arrival Rate (IntA) = aIntA -k (1 — a)IntAoid (1) 

Waiting Time = a( + (1 - a)IntAoid (2) 

The variable IntAnew and IntAoid in Equation 2 refers to the new and old 
inter arrival rate ((from Equation 1) of registration requests respectively, a is a 
smoothing factor which determines how much weight is given to the old inter- 
arrival time. Capacity refers to the bus’s seating capacity. The seating capacity 
is used because a bus’s capacity determines how long it waits. If it has more 
available space then it is likely to wait longer. IntA is calculated at each AR for 
each destination (i.e HA). Each AR will have a different set of Int As and waiting 
time. Therefore as the bus traverses through the network its waiting time varies 
as it passes through ARs with different location update message arrival rates. 
When a bus arrives no calculation of IntA is required since it does not reflect 
the rate of registration requests and packets are not transferred from one bus to 
another. The associated waiting time for the bus is then accessed and Equation 2 
is used to calculate the waiting time. 

To reduce the states maintained at each AR, an expiration timer is associated 
with the IntA and waiting time of each destination. Due to MH migrations the 
rate of registration requests at the AR vary over time. As a result, registration 
requests from a given MH might not pass through an AR again, therefore its 
state needs to be removed. This improves on the scalability since there is no 
need to keep track of all MHs currently in foreign networks. The timer is set to 
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one second in the simulation. When this timer expires the state for the desti- 
nation is removed. When there are new registration requests then the rate and 
corresponding waiting time are recalculated for the given HA. 



4 Evaluation 

To investigate the performance of Mobile IP coupled with ADS we used ns-2 
developed at Lawrence Berkeley National Laboratory [6]. In the simulator we 
used Mobile IP code^ (version 1.4) from the ns-2 distribution and we augmented 
the code to include optimized Mobile IP. We did not implement the full func- 
tionalities as presented in [2]. The augmentation only enables CHs to receive 
binding update, which result in the loading of the tunneling code that encap- 
sulates all outgoing packets from the CH. The topology used in our simulation 
is shown in Figure 3. All links interconnecting ARs (RI to RIO) and end-hosts 
are set to lOMbps and the wireless link has a data rate of 2Mbps. Depending 
on the experiment involved, the MHs are set to migrate at a predetermined 
time to either F A\ or FA 2 . In all of our experiments MHi is the CH. Except 
for Experiment 3, MHi remains stationary for the duration of the simulation. 
Configuration parameters and topology variation that are specific to a given 
experiment are explained later in this section. 

We inject the ADS program into ARs using the programmable switch ap- 
proach [7]. In our simulation, the ANEP header [8] is used to represent an active 
tag which determines whether to inject an ADS program or not. Currently, the 
header only contains a reference to the ADS program to be loaded. 




Fig. 3. Network topology used for simulation 
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In our simulation studies we compare the performance of TCP (NewReno) 
over Original Mobile IP, Optimized Mobile IP and ADS'^. One objetive of the 
simulations is to determine whether ADS provides any benefits to TCP’s through- 
put due to handoff. Interested readers not familiar with the behaviour of TCP 
in a heterogeneous network are referred to [9] [10]. Further details concerning 
different flavours of TCPs and performance studies on multiple packet loss can 
be found in [11]. 

We tested the active delivery process on random mesh and hierarchical 
topologies. In the mesh topology there are 25 nodes connected randomly with a 
degree of two. In the hierarchical topology we consider a binary tree with four 
levels. At each simulation run, CHs, FAs and the HA are connected randomly 
to the topologies considered. Therefore the MH experiences a combination of 
inter and intra domain handoff. Wc consider one MH and migration time is set 
randomly for each run. At each handoff we recorded the number of hops required 
to send the binding updates from an AR and the update latency for each CH is 
measured. The update latency is the time taken for the binding update message 
to reach a given CH from the time of handoff. 

The topology shown in Figure 3 was used to simulate the BusStation frame- 
work. Instead of HAs, MHs and FAs the network is connected by traffic genera- 
tors. These traffic generators are responsible for generating different registration 
request rates (30 distinet HA identifications). All the generators generate pack- 
ets of size 28 bytes with an exponential distribution. The burst and idle time 
is set to 30ms and 1ms respectively. A sink is connected to RIO which counts 
the number of buses and registration requests and also other statistics such as 
packet latencies. Apart from that, each generator node has a TCP session. This 
creates the cross-traffic on the network and we also investigated the impact of 
aggregating ARRs on end-to-end latency of TCP’s traffic. To study the benehts 
of the BusStation option we performed two experiments. Firstly we investigate 
the impact of constant waiting time given varying traffic rates. In this experi- 
ment all generators are set to transmit at rates ranging from 20 to 200 kilobytes. 
The second experiment investigates the use of the adaptive waiting time. 

The following sections present the results obtained from the exhaustive simu- 
lation studies. Note that in each experiment we investigate specific functionalities 
of ADS. For example in Experiment 1 we do not incorporate the SmartBuffer 
scheme. This is because we want to investigate each of ADS functionalities and 
analyse the performance gain due to the given functionality alone. 

4.1 Experiment 1: WAN Migration 

In this scenario we consider the migration of a MH in a WAN. The network 
topology is modified as follows. The links between RIO and HAi, and R4 and 
R5 are modified to a 2Mbps link. 

MH 2 is set to migrate from its home network to EA 2 at t = 40 (where t 
correspond to simulation time unit). MH 2 stays at F A 2 for 10 seconds before it 

^ We have also studied the performance of TCP’s with selective acknowledgement. 



Enhancements to Mobile IP with Active Networks 



411 



migrates to FAi. When MH 2 has migrated to FA\, its registration request is 
intercepted by R6 which in turn sends a registration reply. In the other models 
for Mobile IP the HA handles the MH’s requests which is clearly inefficient in 
this topology since there is a large distance between the MH and the CH. Fur- 
thermore, since the HA sends a binding update to MFli after it has migrated 
the binding update is delayed by the time it takes for the registration request 
to reach the HA plus the time for the binding update to reach MHi. In ADS 
the binding updates (for F A 2 and MHi) are generated once R6 receives a reg- 
istration request. Hence the updates are generated at ARs that are closest to 
the CHs and old FA. Note that after migration to FAi any further migration 
within the FAi subnet will be handled by R4. The AR model achieves a signif- 
icant improvement in performance by generating a binding update to the CH. 
A measure of the improvement can be seen in Table 4.1, with R6 sending the 
binding update, the CH gets updated 4ms sooner. 

Table 4.1 shows the update time and registration time compared to the ARs 
approach. The registration and update time for both the original and optimised 
Mobile IP are the same. A better solution is to have the R4 handle the handoff 
and proceed to send an update to CH and registration reply to MH. The original 
registration request is forwarded onto the HA. Therefore the CH is updated 
sooner and handoff time is greatly reduced. In Table 4.1 the handoffs time are 
chosen arbitrarily. In the simulation at t = 40 the MH moves to a predetermined 
location. This does not mean that the MH has to be at the final location before 
a radio connection is established. The handoff time in Table 4.1 does not include 
the latency of radio establishment. It is latency from the time the MH sends out 
a registration request to the receipt of registration reply. Figure 4(a) shows two 





(a) Congestion Window Size (b) Sequence Number for WAN Migration 

Fig. 4. The MH is set to handoff at t = 40 and t = 50 



412 



Kwan-Wu Chin et al. 



Table 6. Handoff and binding update latencies for WAN migration 



lime 


Original 


Optimised 


Active Kouters 


llandott 1 


6.4ms 


6.94ms 


2.30ms 


liandott 2 


39.22ms 


11.30ms 


2.78ms 


CH Update 1 


2009ms 


2009ms 


2005ms 


CH Update 2 


12ms 


10ms 


0ms 


FAi Update 


10ms 


10ms 


20ms 



drops in the congestion window, at t = 40 and t = 50. In both cases the AR 
approach recovers faster compared to the other two approaches. It is interesting 
to note that in the first handoff, the congestion window in the ARs approach 
recovers faster. This is because R6 sends a binding update and registration reply 
sooner than the Mobile IP protocols. 



4.2 Experiment 2: Impact of CH’s Migration 

In this experiment we show the detrimental effects of CH migration. This is 
where the MH is communicating with another MH across the fixed network. In 
general, the distance between the MH and CH (mobiled) either decreases or in- 
creases. In either case the sender’s retransmission timeout value becomes invalid. 
This is because the data segments and acknowledgement packets that have to 
be rerouted through the HAs of each MH may cause multiple retransmission 
timeouts. In the optimized Mobile IP case, delay in binding udpates may cause 
the loss of acknowledgements which results in waiting for a retransmission timer 
expiration before reinitiation of data flow. 

The migration time for the CH and MH is set at t=5.0. In this scenario 
both data segments and acknowledgement packets may experience losses. The 
performance of TCP in this scenario for both Mobile IP models and ADS is 
shown in Figure 5. As can be seen both Mobile IP models recover poorly. This is 
due to the reasons discussed above. In Figure 5(b) we see that only two packets 
need to be retransmitted with ADS. For the original Mobile IP it only manages to 
recover after t = 5.5. That is after both HAs have received registration requests 
from their respective MHs. For the optimized Mobile IP, due to delays in binding 
updates some packets are lost at the MH’s previous location (packets 452 and 
453), thus it takes longer for TCP to exit from its fast recovery phase. 

4.3 Experiment 3: Delivery of Binding Updates 

In this experiment we quantify the efficiency achieved by having ARs monitor 
and send binding updates appropriately. The average hop count recorded by 
ARs and the average latencies observed by CHs are shown in Tables 4.3 and 4.3. 
From the tables we can see that for both topologies ARs perform better than 
optimized Mobile IP. 
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(a) Congestion Window 
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Fig. 5. Mobile CH 



Table 7. Randomized Mesh Topology 



Schemes 


Average Hops 


Average Latencies (seconds) 


AR 


2.9 


0.22 


Optimized 


5.9 


0.38 



4.4 Experiment 4: The Benefits of Aggregating Registration 
Requests 

This experiment investigates the benefits of enabling the BusStation framework. 
Firstly, the effects of varying waiting time is studied. Note that the waiting time 
here is not calculated based on equation 2. The reason being, the amount of 
reduction in packets that can be achieved and its impact on end-to-end latencies 
need to be quantified. The results of this experiment are shown in Figures 6(a) 
and (b) . Secondly, an adaptive waiting time using equation 1 is incorporated to 
determine the rate of registration requests which in turn dictates the waiting 
time required using equation 2. The corresponding results when the adaptive 
waiting time is incorporated are shown in Figures 7(a) and (b). 

In Figure 6(a), it can be seen that with increasing rates the number of packets 
aggregated is high. At lower rates the bus needs to wait longer to maximize the 
number of aggregated packets at the cost of increased end-to-end latency. It is 
evident in Figure 6(a) that at higher rates the end-to-end latency of packets is 
reduced due to reduction in packet processing at each router. A fixed waiting 
time does not adapt well since different routers would experience different rates. 



Table 8. Hierarchical Topology 



Schemes 


Average Hops 


Average Latencies (seconds) 


AR 


1.43 


0.18 


Optimized 


4.9 


0.81 
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For example, ARs close to a busy web server are more likely to experience high 
rates therefore a higher waiting time is required. In Figures 7(a) and (b), the 
results shows the performance of ARs with an adaptive timer. As can be seen 
from Figure 7(a) the lowest latency is when a = 0.1. The end-to-end latency 
remains fairly constant except when a. = 1.0. This means that the bus stops 
at ARs that have a high pickup probability. A comparison of TCP’s round-trip 
time (RTT) with and without the BusStation framework is also investigated. 
The average RTT values with and without BusStation framework are 2.33 and 
2.50 seconds respectively. Hence other traffic such as TCP benefit when the 
BusStation framework is enabled. 



"latency _reduction" 





(a) (b) 

Fig. 6. (a) Average reduction in registration request latency, (b) Number of 
packets reduced with ABus 




Fig. 7. (a) Average latency with varying 
with varying a values 




(b) 

values, (b) Average packet reduction 
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5 Conclusion 

This study has investigated the application of intra-network processing in the 
context of mobile communications, specifically routing. The main limitations of 
the Mobile IP models have been identified and we have shown that ADS can 
overcome these limitations. We have shown the importance of fast rerouting and 
notifications of CHs, especially in a scenario where both communicating parties 
(MH and CH) are mobile. Apart from that we have shown the benefits of two 
novel schemes: BusStation framework and efficent delivery of binding updates. 
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Abstract. There are currently several impediments delaying the rapid 
introduction of Active Networks. A basic one is the impossibility of 
updating Internet overnight, even the access nodes only, to support 
active network applications. The deployment of active networks will be 
gradual, beginning with only few active nodes. However, the provision 
of active services requires that the traffic visits active nodes. Clearly, a 
ubiquitous service cannot be provided with just a few active nodes and 
current Internet routing. That is, with the routing used to cater for fixed 
hosts. In case of mobile hosts using Mobile IP, there is a way to 
circumvent the need to update a substantial amount of Internet routers, 
and still to offer omnipresent active services. When Mobile IP is seen as 
a fundamental capability of almost every future end-user host, and that 
the users most certainly appreciate the mobility support, and that the 
active services are most suited for mobile environment, we can draw 
the following conclusion: An active network "overlay" based on Mobile 
IP routing is both reflecting user needs and suited to remove the need 
for extensive updating of Internet nodes prior active network service 
provision. This paper presents a viable architecture for a Mobile IP 
based active network overlay, capable of dynamically assigning 
suitably located active nodes to cater for traffic sources involved in a 
particular session. The design and implementation of the proposed 
architecture will be carried out in Eurescom project P926 (Caspian). 



1 Introduction 

Most of the proposed active network architectures aim for a wholesale updating of 
Internet infrastructure: the "activeness" of a substantial part of Internet is a 
prerequisite for provision of many active services [1][2][3][4]. These kinds of 
architectures are currently tested in e.g. ABone [5]. However, the active network 
paradigm is a very tempting solution for services requiring a large degree of 
dynamism at the implementation side. An early introduction of such a technology can 
prove to be a better course of action than resorting to current solutions, say using a 
"brute force" approach in ubiquitous custom-service deployment. 

H. Yasuda (Ed.): IWAN 2000, LNCS 1942, pp. 416-422, 2000. 
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A "lighter" way to deploy active network services is therefore in demand. "Light" 
means here that the existing Internet should be left as intact as possible and that the 
number of necessary special nodes should be small. The characteristics of Internet 
routing do not allow simple directing of particular traffic to select network nodes 
(active nodes). To circumvent this problem, the current solutions include an overlay 
network like X-Bone [6] or ANON [7]. However, Mobile IP [8] directs traffic through 
known nodes, that is, the nodes hosting Mobile IP agents. The traffic flows can be 
"grabbed" into active space of these nodes, to be treated as wished. This can be seen 
as a form of "overlay" networking, since the existing Internet infrastructure need not 
to be touched. The requirement for Mobile IP is not a serious limitation, since it is 
more than likely that end-users will embrace Mobile IP in the near future. From the 
user-services point of view there is no need to consider non-mobile hosts (i.e. fixed 
hosts) at ah. 

Ideally, mobility issues should not diminish the flexibility of active nodes. The 
greatest flexibility is achieved when as much as possible of the functionality of active 
nodes resides in active space. On the other hand, some parts of functionality may 
prove to be perfected to a degree where there is no need for frequent updates or 
modifications. Parts of mobility functionality may fall into this category. So, at first, 
the required Mobile IP related functionality in active nodes could be quite non-active. 
If a need for constant updating of some Mobile IP functionality is recognised, these 
will be implemented as active. Anyway, an implementation of Mobile IP-like routing 
is required for the creation of the proposed active overlay. 



2 Architecture 

In the presented scheme it is possible to keep the current Mobile IP interactions 
unaltered. The difference lies elsewhere, namely in new mobile agents with novel 
internal functionality. Towards usual Mobile IP entities supporting Home Agent 
(HA), Foreign Agent (FA) or Mobile Node (MN) behaviour the new agents exhibit 
the associated standard Mobile IP roles. The new agents are located at operator side 
and they include: Extended Home Agent (EHA), Bridgehead Agent (BA) and 
Correspondent Agent (CA). The agents exist as software processes, possibly as active 
ones, in active nodes (see Figure 1). In customer domain negligible modifications are 
required. These modifications do not alter the Mobile IP functionality as seen from 
the foreign network point of view, and the changes for MN implementations are 
minor. 

EHA is required to act as an active "front-end" to HA. In the usual case all home 
network external mobile traffic is forced to traverse through the active node hosting 
EHA, and therefore active services can be provided to the home network. In rare 
cases where an active node (hosting EHA) cannot be simply placed at the operator 
end of an access network, HA must be modified to forward registration packets to 
EHA. This can be considered as a minor modification of Mobile IP, but it affects HA 
only. EHA is also responsible for locating active nodes, with the help of e.g. directory 
services, for other overlay agents (BA, CA, see below). These agents will be 
instantiated "on the fly" into the selected active nodes. 
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BA provides MN with active access to Internet. It is instantiated into an active 
node near MN when required. Using BA, value-added services available at the home 
network can be brought to the vicinity of MN, say a user-specific firewall. 

CA is invoked by request of MN into a "beachhead" location, i.e. into an active 
node near a Correspondent Node, CN. Other MN-specific active services may make 
use of the active node supporting CA, e.g. transcoding and filtering traffic destined to 
MN near its source. CA can provide MN location hiding, too. 

Requirements that can be fulfilled by the described architecture include: 

• Possibility to assign an active node in the proximity of each traffic source (MN, 
CNs) and instantiate the agents into them 

• Operations of the overlay shall be transparent from the foreign network point of 
view 

• MN shall not be dependent on the availability of the active access services 

• The impact on (future) Mobile IP security arrangements (e.g. authentication of 
different parties) shall be negligible 

• All mobile traffic traverses through EHA, BA and optionally CA 

• Elimination of triangle-routing, present in basic IPv4 Mobile IP. The solution shall 
be based on mechanisms proposed in related IETF drafts, e.g. draft-ietf-mobileip- 
optim-**.txt. The concrete requirement is that CN is capable of processing Mobile 
IP "binding update" requests. This function should (= a RFC term) be supported by 
IPv6 

• Allows future extensions to provide seamless roaming and hand-over 

• The user shall be able to request active services 

• It shall be possible for the mobile user to request (network) location hiding from 
CN 

• Suitability for multi-operator environment 

• Directory service for locating suitable active nodes 

The main elements, their relations and a (very) coarse functional overview of the 
proposed architecture is shown in the Fig. 1. The situation is as follows: The EHA has 
monitored the registration process (1). After this EHA has located a suitable active 
node (AN) near MN and instantiated a BA for MN into it. When communicating with 
CN, a CA may be instantiated in addition. After these preliminaries the traffic 
between CN and MN is routed through CA and BA (2). Wherever possible, the 
messages related to Mobile IP interactions are not altered by new agents to avoid 
complications with future Mobile IP security arrangements. No serious difficulties are 
foreseen. The firewalls (FW) in the figure just reflect the current practice. 



3 Experiment 

The design and building of an active network overlay for mobile end devices is 
carried out in an experiment within Eurescom project P926 [9]. The objective is to 
prove the feasibility of the proposed architecture. 

Some of the requirements placed on the architecture are straightforward to verify in 
the experiment, e.g. that the operations of the overlay shall be transparent from the 
foreign network point of view and that the mobile node shall not be dependent on the 
availability of the active access services. 
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Fig. 1. Active Mobile Overlay architecture 



To verify that the system based on this architecture delivers active services, say 
dynamically invoked and configured firewalling, is slightly more complicated task. 
This is so because the various service possibilities are not known in advance. 
However, at least a rudimentary form of some service introduced for the project will 
be available when needed by this experiment. The operation of such a service will be 
the proof of success. 

The experiment utilises active nodes based on Linux with 2.3.x kernel and Netfilter 
[10] The active nodes support the EHA, BA and CA roles, and the implementations 
will be coded in Java. This means that the performance is not expected to reach the 
level required for practical applications. However, the principles and methods learned 
can be later applied on any language or platform in "real world" systems. XML is 
used to define active network related message formats. End devices (PCs with Linux) 
are installed with a slightly modified Mobile IP. The Mobile IP implementation 
selected for the experiment is Dynamics (developed in Helsinki University of 
Technology) [11]. The role of a particular Mobile IP implementation is however 
supportive; the Dynamics code is not necessarily installed into active nodes, but into 
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mobile node, home and foreign networks. In addition, Dynamics is not the only 
suitable Mobile IP implementation. Any other software can be utilised, provided that 
the necessary modifications are possible. Dynamics seems to be the best choice given 
the peripheral devices (Linux machines) and the anticipated need for slight 
configuration and functionality modifications. These alterations may be required in 
MN. This does not mean violating Mobile IP protocol specifications. 

In active nodes, the Mobile IP "signalling" packets are studied, maybe modified 
and used to trigger actions. The software required for this may utilise functions found 
in Dynamics (say, authentication) if so wished. However, the roles of the active nodes 
differ from the basic Mobile IP, i.e. the Dynamics code does not cater for these new 
functions. 

The scale of the testbed is as follows (see Fig. 2): 

• Two hosts for MN (CN role as well), two for HA/FA and two for EHA/BA/CA 

• Three IP-subnetworks 




Fig. 2. Testbed set-up 

Note that in the Fig. 2 the arrangement is in the form of a "mirror image", i.e. 
mobile hosts act in dual roles (MN and/or CN) depending on the situation. 



4 Initial Results 

The tables below present the involved functions and functional blocks with their 
completion status. The intention of the experimentation is to prove the feasibility, and 
the scheme is based largely on "signalling" type mechanisms. Therefore operational 
measurements, like those related with performance, are not considered as relevant at 
this stage. 

The next table describes the status in implementing additional Active Node entities 
concerning Mobile Active Overlay. 
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Table 1. Implementation status of entities 



Entity 


Status 


Comment 


EHA 


Mostly ready, partially tested 


route opt. incomplete 


BA 


Not ready yet 


will be based on EHA 


CA 


Not ready yet 





Following table presents the status of required basic components: 



Table 2. Status of basic components 



Component 


Status 


Comment 


Mobile IP registration 


Tested 


initial MN registration 


IPIP tunnelling [121 


Tested 




Mobile IP route optimisation 


Partially ready & tested 


daemon for CN ready 


Active service configuration 


Not ready yet 


Specifications not 
ready 



The next table lists in detail all registration phases that are to be tested. 



Table 3. Status of different registration phases. (* BA not implemented yet) 



Registration Phase 


Status 


Comment 


Initial MN registration 


Tested 


MN<->FA<->EHA<->HA 


2nd phase registration 


Not ready yet* 


MN<->FA<->BA<->EHA<->HA 


Registration completion 


Not ready yet * 


MN<->FA<->BA<->HA 



Although the work is still in progress, it can be already said that no basic faults are 
likely to appear in the proposed concept. The most critical pieces of software (EHA, 
initial MN registration) are now complete enough to warrant this. 



5 Summary 

An architecture making possible the provision of active network services with a 
Mobile IP based overlay along with the status of the ongoing experimental 
confirmation are presented. An overlay architecture like this will provide an 
opportunity for network operators to easily offer active services with a minimum 
impact on the existing Internet and Mobile IP. The architecture is being verified in an 
experiment within the Eurescom project P926 (Caspian). 

The result of the experiment will consist of a set of active nodes capable of mobile 
active overlay. This means that user devices (PCs with a slightly modified Mobile IP 
installed) will have all of their traffic routed through nearest active nodes connected to 
Internet. Into these nodes active services can be instantiated to cater for individual 
users/MNs. 

After completing a successful proof of concept, it will be possible to develop "real 
world" implementations for active services. 
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